LUMI (CSC)
The LUMI cluster is located at CSC (Finland). Each node contains 4 AMD MI250X GPUs, each with 2 Graphics Compute Dies (GCDs) for a total of 8 GCDs per node. You can think of the 8 GCDs as 8 separate GPUs, each having 64 GB of high-bandwidth memory (HBM2E).
Introduction
If you are new to this system, please see the following resources:
Batch system: Slurm
-
$HOME
: single user, intended to store user configuration files and personal data (20GB default quota)/project/$proj
: shared with all members of a project, purged at the end of a project (50 GB default quota)/scratch/$proj
: temporary storage, main storage to be used for disk I/O needs when running simulations on LUMI, purged every 90 days (50TB default quota)
Preparation
Use the following commands to download the WarpX source code:
git clone https://github.com/ECP-WarpX/WarpX.git $HOME/src/warpx
We use system software modules, add environment hints and further dependencies via the file $HOME/lumi_warpx.profile
.
Create it now:
cp $HOME/src/warpx/Tools/machines/lumi-csc/lumi_warpx.profile.example $HOME/lumi_warpx.profile
Edit the 2nd line of this script, which sets the export proj=""
variable using a text editor
such as nano
, emacs
, or vim
(all available by default on
LUMI login nodes).
Important
Now, and as the first step on future logins to LUMI, activate these environment settings:
source $HOME/lumi_warpx.profile
Finally, since LUMI does not yet provide software modules for some of our dependencies, install them once:
bash $HOME/src/warpx/Tools/machines/lumi-csc/install_dependencies.sh
source $HOME/sw/lumi/gpu/venvs/warpx-lumi/bin/activate
Compilation
Use the following cmake commands to compile the application executable:
cd $HOME/src/warpx
rm -rf build_lumi
cmake -S . -B build_lumi -DWarpX_COMPUTE=HIP -DWarpX_PSATD=ON -DWarpX_QED_TABLE_GEN=ON -DWarpX_DIMS="1;2;RZ;3"
cmake --build build_lumi -j 16
The WarpX application executables are now in $HOME/src/warpx/build_lumi/bin/
.
Additionally, the following commands will install WarpX as a Python module:
rm -rf build_lumi_py
cmake -S . -B build_lumi_py -DWarpX_COMPUTE=HIP -DWarpX_PSATD=ON -DWarpX_QED_TABLE_GEN=ON -DWarpX_APP=OFF -DWarpX_PYTHON=ON -DWarpX_DIMS="1;2;RZ;3"
cmake --build build_lumi_py -j 16 --target pip_install
Update WarpX & Dependencies
If you already installed WarpX in the past and want to update it, start by getting the latest source code:
cd $HOME/src/warpx
# read the output of this command - does it look ok?
git status
# get the latest WarpX source code
git fetch
git pull
# read the output of these commands - do they look ok?
git status
git log # press q to exit
And, if needed,
log out and into the system, activate the now updated environment profile as usual,
As a last step, clean the build directory rm -rf $HOME/src/warpx/build_lumi
and rebuild WarpX.
Running
MI250X GPUs (2x64 GB)
In non-interactive runs:
#!/bin/bash -l
#SBATCH -A <project id>
#SBATCH --job-name=warpx
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.err
#SBATCH --partition=standard-g
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=8
#SBATCH --gpus-per-node=8
#SBATCH --time=00:10:00
export MPICH_GPU_SUPPORT_ENABLED=1
# note (12-12-22)
# this environment setting is currently needed on LUMI to work-around a
# known issue with Libfabric
#export FI_MR_CACHE_MAX_COUNT=0 # libfabric disable caching
# or, less invasive:
export FI_MR_CACHE_MONITOR=memhooks # alternative cache monitor
# Seen since August 2023 seen on OLCF (not yet seen on LUMI?)
# OLCFDEV-1597: OFI Poll Failed UNDELIVERABLE Errors
# https://docs.olcf.ornl.gov/systems/frontier_user_guide.html#olcfdev-1597-ofi-poll-failed-undeliverable-errors
#export MPICH_SMP_SINGLE_COPY_MODE=NONE
#export FI_CXI_RX_MATCH_MODE=software
# note (9-2-22, OLCFDEV-1079)
# this environment setting is needed to avoid that rocFFT writes a cache in
# the home directory, which does not scale.
export ROCFFT_RTC_CACHE_PATH=/dev/null
export OMP_NUM_THREADS=1
# LUMI documentation suggests using the following wrapper script
# to set the ROCR_VISIBLE_DEVICES to the value of SLURM_LOCALID
# see https://docs.lumi-supercomputer.eu/runjobs/scheduled-jobs/lumig-job/
cat << EOF > select_gpu
#!/bin/bash
export ROCR_VISIBLE_DEVICES=\$SLURM_LOCALID
exec \$*
EOF
chmod +x ./select_gpu
sleep 1
# LUMI documentation suggests using the following CPU bind
# so that the node local rank and GPU ID match
# see https://docs.lumi-supercomputer.eu/runjobs/scheduled-jobs/lumig-job/
CPU_BIND="map_cpu:48,56,16,24,1,8,32,40"
srun --cpu-bind=${CPU_BIND} ./select_gpu ./warpx inputs | tee outputs.txt
rm -rf ./select_gpu
Post-Processing
Note
TODO: Document any Jupyter or data services.
Known System Issues
Warning
December 12th, 2022: There is a caching bug in libFabric that causes WarpX simulations to occasionally hang on LUMI on more than 1 node.
As a work-around, please export the following environment variable in your job scripts until the issue is fixed:
#export FI_MR_CACHE_MAX_COUNT=0 # libfabric disable caching
# or, less invasive:
export FI_MR_CACHE_MONITOR=memhooks # alternative cache monitor
Warning
January, 2023: We discovered a regression in AMD ROCm, leading to 2x slower current deposition (and other slowdowns) in ROCm 5.3 and 5.4.
June, 2023: Although a fix was planned for ROCm 5.5, we still see the same issue in this release and continue to exchange with AMD and HPE on the issue.
Stay with the ROCm 5.2 module to avoid a 2x slowdown.
Warning
May 2023: rocFFT in ROCm 5.1-5.3 tries to write to a cache in the home area by default. This does not scale, disable it via:
export ROCFFT_RTC_CACHE_PATH=/dev/null