Crusher (OLCF)

The Crusher cluster is located at OLCF. Each node contains 4 AMD MI250X GPUs, each with 2 Graphics Compute Dies (GCDs) for a total of 8 GCDs per node. You can think of the 8 GCDs as 8 separate GPUs, each having 64 GB of high-bandwidth memory (HBM2E).

Introduction

If you are new to this system, please see the following resources:

  • Crusher user guide

  • Batch system: Slurm

  • Production directories:

    • $PROJWORK/$proj/: shared with all members of a project, purged every 90 days (recommended)

    • $MEMBERWORK/$proj/: single user, purged every 90 days (usually smaller quota)

    • $WORLDWORK/$proj/: shared with all users, purged every 90 days

    • Note that the $HOME directory is mounted as read-only on compute nodes. That means you cannot run in your $HOME.

Installation

Use the following commands to download the WarpX source code and switch to the correct branch:

git clone https://github.com/ECP-WarpX/WarpX.git $HOME/src/warpx

We use the following modules and environments on the system ($HOME/crusher_warpx.profile).

Listing 14 You can copy this file from Tools/machines/crusher-olcf/crusher_warpx.profile.example.
# please set your project account
#    note: WarpX ECP members use aph114_crusher
#export proj=<yourProject>

# required dependencies
module load cpe/22.08
module load cmake/3.23.2
module load craype-accel-amd-gfx90a
module load rocm/5.2.0
module load cray-mpich
module load cce/14.0.2  # must be loaded after rocm

# optional: faster builds
module load ccache
module load ninja

# optional: just an additional text editor
module load nano

# optional: for PSATD in RZ geometry support (not yet available)
#module load blaspp
#module load lapackpp
export CMAKE_PREFIX_PATH=$HOME/sw/crusher/blaspp-master:$CMAKE_PREFIX_PATH
export CMAKE_PREFIX_PATH=$HOME/sw/crusher/lapackpp-master:$CMAKE_PREFIX_PATH
export LD_LIBRARY_PATH=$HOME/sw/crusher/blaspp-master/lib64:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=$HOME/sw/crusher/lapackpp-master/lib64:$LD_LIBRARY_PATH

# optional: for QED lookup table generation support
#module load boost/1.78.0-cxx17

# optional: for openPMD support
module load adios2/2.8.1
module load cray-hdf5-parallel/1.12.1.5

# optional: for Python bindings or libEnsemble
module load cray-python/3.9.12.1

# fix system defaults: do not escape $ with a \ on tab completion
shopt -s direxpand

# an alias to request an interactive batch node for one hour
#   for paralle execution, start on the batch node: srun <command>
alias getNode="salloc -A $proj -J warpx -t 01:00:00 -p batch -N 1 -c 8 --ntasks-per-node=8"
# an alias to run a command on a batch node for up to 30min
#   usage: runNode <command>
alias runNode="srun -A $proj -J warpx -t 00:30:00 -p batch -N 1 -c 8 --ntasks-per-node=8"

# GPU-aware MPI
export MPICH_GPU_SUPPORT_ENABLED=1

# optimize CUDA compilation for MI250X
export AMREX_AMD_ARCH=gfx90a

# compiler environment hints
export CC=$(which cc)
export CXX=$(which CC)
export FC=$(which ftn)
export CFLAGS="-I${ROCM_PATH}/include"
export CXXFLAGS="-I${ROCM_PATH}/include"
#export LDFLAGS="-L${ROCM_PATH}/lib -lamdhip64"

We recommend to store the above lines in a file, such as $HOME/crusher_warpx.profile, and load it into your shell after a login:

source $HOME/crusher_warpx.profile

And since Crusher does not yet provide a module for them, install BLAS++ and LAPACK++:

# BLAS++ (for PSATD+RZ)
git clone https://github.com/icl-utk-edu/blaspp.git src/blaspp
rm -rf src/blaspp-crusher-build
cmake -S src/blaspp -B src/blaspp-crusher-build -Duse_openmp=OFF -Dgpu_backend=hip -DCMAKE_CXX_STANDARD=17 -DCMAKE_INSTALL_PREFIX=$HOME/sw/crusher/blaspp-master
cmake --build src/blaspp-crusher-build --target install --parallel 10

# LAPACK++ (for PSATD+RZ)
git clone https://github.com/icl-utk-edu/lapackpp.git src/lapackpp
rm -rf src/lapackpp-crusher-build
cmake -S src/lapackpp -B src/lapackpp-crusher-build -DCMAKE_CXX_STANDARD=17 -Dbuild_tests=OFF -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=ON -DCMAKE_INSTALL_PREFIX=$HOME/sw/crusher/lapackpp-master
cmake --build src/lapackpp-crusher-build --target install --parallel 10

Then, cd into the directory $HOME/src/warpx and use the following commands to compile:

cd $HOME/src/warpx
rm -rf build

cmake -S . -B build -DWarpX_DIMS=3 -DWarpX_COMPUTE=HIP
cmake --build build -j 10

The general cmake compile-time options apply as usual.

Running

MI250X GPUs (2x64 GB)

ECP WarpX project members, use the aph114 project ID.

After requesting an interactive node with the getNode alias above, run a simulation like this, here using 8 MPI ranks and a single node:

runNode ./warpx inputs

Or in non-interactive runs:

Listing 15 You can copy this file from Tools/machines/crusher-olcf/submit.sh.
#!/usr/bin/env bash

#SBATCH -A <project id>
#    note: WarpX ECP members use aph114
#SBATCH -J warpx
#SBATCH -o %x-%j.out
#SBATCH -t 00:10:00
#SBATCH -p batch
#SBATCH --ntasks-per-node=8
#SBATCH --cpus-per-task=8
#SBATCH --gpus-per-task=1
#SBATCH --gpu-bind=closest
#SBATCH -N 1

# From the documentation:
# Each Crusher compute node consists of [1x] 64-core AMD EPYC 7A53
# "Optimized 3rd Gen EPYC" CPU (with 2 hardware threads per physical core) with
# access to 512 GB of DDR4 memory.
# Each node also contains [4x] AMD MI250X, each with 2 Graphics Compute Dies
# (GCDs) for a total of 8 GCDs per node. The programmer can think of the 8 GCDs
# as 8 separate GPUs, each having 64 GB of high-bandwidth memory (HBM2E).

# note (5-16-22, OLCFHELP-6888)
# this environment setting is currently needed on Crusher to work-around a
# known issue with Libfabric
#export FI_MR_CACHE_MAX_COUNT=0  # libfabric disable caching
# or, less invasive:
export FI_MR_CACHE_MONITOR=memhooks  # alternative cache monitor

# note (9-2-22, OLCFDEV-1079)
# this environment setting is needed to avoid that rocFFT writes a cache in
# the home directory, which does not scale.
export ROCFFT_RTC_CACHE_PATH=/dev/null

export OMP_NUM_THREADS=8
srun ./warpx inputs > output.txt

Post-Processing

For post-processing, most users use Python via OLCFs’s Jupyter service (Docs).

Please follow the same guidance as for OLCF Summit post-processing.

Known System Issues

Warning

May 16th, 2022 (OLCFHELP-6888): There is a caching bug in Libfrabric that causes WarpX simulations to occasionally hang on Crusher on more than 1 node.

As a work-around, please export the following environment variable in your job scripts until the issue is fixed:

#export FI_MR_CACHE_MAX_COUNT=0  # libfabric disable caching
# or, less invasive:
export FI_MR_CACHE_MONITOR=memhooks  # alternative cache monitor

Warning

Sep 2nd, 2022 (OLCFDEV-1079): rocFFT in ROCm 5.1+ tries to write to a cache in the home area by default. This does not scale, disable it via:

export ROCFFT_RTC_CACHE_PATH=/dev/null