Perlmutter (NERSC)¶

The Perlmutter cluster is located at NERSC.

Introduction¶

If you are new to this system, please see the following resources:

NERSC user guide
Batch system: Slurm
Jupyter service
Production directories:
- $PSCRATCH: per-user production directory, purged every 30 days (<TBD>TB)
- /global/cscratch1/sd/m3239: shared production directory for users in the project m3239, purged every 30 days (50TB)
- /global/cfs/cdirs/m3239/: community file system for users in the project m3239 (100TB)

Installation¶

Use the following commands to download the WarpX source code and switch to the correct branch:

git clone https://github.com/ECP-WarpX/WarpX.git $HOME/src/warpx

We use the following modules and environments on the system ($HOME/perlmutter_warpx.profile).

Listing 7 You can copy this file from Tools/machines/perlmutter-nersc/perlmutter_warpx.profile.example.¶

# please set your project account
#export proj="<yourProject>_g"  # change me

# required dependencies
module load cmake/3.22.0

# optional: for QED support with detailed tables
module load boost/1.78.0-gnu

# optional: for openPMD and PSATD+RZ support
module load cray-hdf5-parallel/1.12.1.5
export CMAKE_PREFIX_PATH=$HOME/sw/perlmutter/c-blosc-1.21.1:$CMAKE_PREFIX_PATH
export CMAKE_PREFIX_PATH=$HOME/sw/perlmutter/adios2-2.7.1:$CMAKE_PREFIX_PATH
export CMAKE_PREFIX_PATH=$HOME/sw/perlmutter/blaspp-master:$CMAKE_PREFIX_PATH
export CMAKE_PREFIX_PATH=$HOME/sw/perlmutter/lapackpp-master:$CMAKE_PREFIX_PATH

export LD_LIBRARY_PATH=$HOME/sw/perlmutter/c-blosc-1.21.1/lib64:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=$HOME/sw/perlmutter/adios2-2.7.1/lib64:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=$HOME/sw/perlmutter/blaspp-master/lib64:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=$HOME/sw/perlmutter/lapackpp-master/lib64:$LD_LIBRARY_PATH

# optional: for Python bindings or libEnsemble
module load cray-python/3.9.12.1

if [ -d "$HOME/sw/perlmutter/venvs/warpx" ]
then
  source $HOME/sw/perlmutter/venvs/warpx/bin/activate
fi

# an alias to request an interactive batch node for one hour
#   for parallel execution, start on the batch node: srun <command>
alias getNode="salloc -N 1 --ntasks-per-node=4 -t 1:00:00 -q interactive -C gpu --gpu-bind=single:1 -c 32 -G 4 -A $proj"
# an alias to run a command on a batch node for up to 30min
#   usage: runNode <command>
alias runNode="srun -N 1 --ntasks-per-node=4 -t 0:30:00 -q interactive -C gpu --gpu-bind=single:1 -c 32 -G 4 -A $proj"

# GPU-aware MPI
export MPICH_GPU_SUPPORT_ENABLED=1

# necessary to use CUDA-Aware MPI and run a job
export CRAY_ACCEL_TARGET=nvidia80

# optimize CUDA compilation for A100
export AMREX_CUDA_ARCH=8.0

# optimize CPU microarchitecture for AMD EPYC 3rd Gen (Milan/Zen3)
# note: the cc/CC/ftn wrappers below add those
#export CXXFLAGS="-march=znver3"
#export CFLAGS="-march=znver3"

# compiler environment hints
export CC=cc
export CXX=CC
export FC=ftn
export CUDACXX=$(which nvcc)
export CUDAHOSTCXX=CC

We recommend to store the above lines in a file, such as $HOME/perlmutter_warpx.profile, and load it into your shell after a login:

source $HOME/perlmutter_warpx.profile

And since Perlmutter does not yet provide a module for them, install ADIOS2, BLAS++ and LAPACK++:

# c-blosc (I/O compression)
git clone -b v1.21.1 https://github.com/Blosc/c-blosc.git src/c-blosc
rm -rf src/c-blosc-pm-build
cmake -S src/c-blosc -B src/c-blosc-pm-build -DBUILD_TESTS=OFF -DBUILD_BENCHMARKS=OFF -DDEACTIVATE_AVX2=OFF -DCMAKE_INSTALL_PREFIX=$HOME/sw/perlmutter/c-blosc-1.21.1
cmake --build src/c-blosc-pm-build --target install --parallel 16

# ADIOS2
git clone -b v2.7.1 https://github.com/ornladios/ADIOS2.git src/adios2
rm -rf src/adios2-pm-build
cmake -S src/adios2 -B src/adios2-pm-build -DADIOS2_USE_Blosc=ON -DADIOS2_USE_Fortran=OFF -DADIOS2_USE_Python=OFF -DADIOS2_USE_ZeroMQ=OFF -DCMAKE_INSTALL_PREFIX=$HOME/sw/perlmutter/adios2-2.7.1
cmake --build src/adios2-pm-build --target install -j 16

# BLAS++ (for PSATD+RZ)
git clone https://github.com/icl-utk-edu/blaspp.git src/blaspp
rm -rf src/blaspp-pm-build
CXX=$(which CC) cmake -S src/blaspp -B src/blaspp-pm-build -Duse_openmp=OFF -Dgpu_backend=cuda -DCMAKE_CXX_STANDARD=17 -DCMAKE_INSTALL_PREFIX=$HOME/sw/perlmutter/blaspp-master
cmake --build src/blaspp-pm-build --target install --parallel 16

# LAPACK++ (for PSATD+RZ)
git clone https://github.com/icl-utk-edu/lapackpp.git src/lapackpp
rm -rf src/lapackpp-pm-build
CXX=$(which CC) CXXFLAGS="-DLAPACK_FORTRAN_ADD_" cmake -S src/lapackpp -B src/lapackpp-pm-build -DCMAKE_CXX_STANDARD=17 -Dbuild_tests=OFF -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=ON -DCMAKE_INSTALL_PREFIX=$HOME/sw/perlmutter/lapackpp-master
cmake --build src/lapackpp-pm-build --target install --parallel 16

Optionally, download and install Python packages for PICMI or dynamic ensemble optimizations (libEnsemble):

python3 -m pip install --user --upgrade pip
python3 -m pip install --user virtualenv
python3 -m pip cache purge
rm -rf $HOME/sw/perlmutter/venvs/warpx
python3 -m venv $HOME/sw/perlmutter/venvs/warpx
source $HOME/sw/perlmutter/venvs/warpx/bin/activate
python3 -m pip install --upgrade pip
python3 -m pip install --upgrade wheel
python3 -m pip install --upgrade cython
python3 -m pip install --upgrade numpy
python3 -m pip install --upgrade pandas
python3 -m pip install --upgrade scipy
MPICC="cc -target-accel=nvidia80 -shared" python3 -m pip install --upgrade mpi4py --no-build-isolation --no-binary mpi4py
python3 -m pip install --upgrade openpmd-api
python3 -m pip install --upgrade matplotlib
python3 -m pip install --upgrade yt
# optional: for libEnsemble
python3 -m pip install -r $HOME/src/warpx/Tools/LibEnsemble/requirements.txt

Then, cd into the directory $HOME/src/warpx and use the following commands to compile:

cd $HOME/src/warpx
rm -rf build

cmake -S . -B build -DWarpX_DIMS=3 -DWarpX_COMPUTE=CUDA
cmake --build build -j 16

The general cmake compile-time options apply as usual.

For a full PICMI install, follow the instructions for Python (PICMI) bindings:

# PICMI build
cd $HOME/src/warpx

# install or update dependencies
python3 -m pip install -r requirements.txt

# compile parallel PICMI interfaces in 3D, 2D, 1D and RZ
WARPX_MPI=ON WARPX_COMPUTE=CUDA WARPX_PSATD=ON BUILD_PARALLEL=16 python3 -m pip install --force-reinstall --no-deps -v .

Or, if you are developing, do a quick PICMI install of a single geometry (see: WarpX_DIMS) using:

# find dependencies & configure
cmake -S . -B build -DWarpX_COMPUTE=CUDA -DWarpX_PSATD=ON -DWarpX_LIB=ON -DWarpX_DIMS=RZ

# build and then call "python3 -m pip install ..."
cmake --build build --target pip_install -j 16

Running¶

A100 GPUs (40 GB)¶

The batch script below can be used to run a WarpX simulation on multiple nodes (change -N accordingly) on the supercomputer Perlmutter at NERSC. Replace descriptions between chevrons <> by relevant values, for instance <input file> could be plasma_mirror_inputs. Note that we run one MPI rank per GPU.

Listing 8 You can copy this file from Tools/machines/perlmutter-nersc/perlmutter.sbatch.¶

#!/bin/bash -l

# Copyright 2021-2022 Axel Huebl, Kevin Gott
#
# This file is part of WarpX.
#
# License: BSD-3-Clause-LBNL

#SBATCH -t 01:00:00
#SBATCH -N 4
#SBATCH -J WarpX
#    note: <proj> must end on _g
#SBATCH -A <proj>
#SBATCH -q regular
#SBATCH -C gpu
#SBATCH --exclusive
#SBATCH --ntasks-per-gpu=1
#SBATCH --gpus-per-node=4
#SBATCH -o WarpX.o%j
#SBATCH -e WarpX.e%j

# GPU-aware MPI
export MPICH_GPU_SUPPORT_ENABLED=1
export MPICH_OFI_NIC_POLICY=GPU

# threads for OpenMP and threaded compressors per MPI rank
export SRUN_CPUS_PER_TASK=32

EXE=./warpx
#EXE=../WarpX/build/bin/warpx.3d.MPI.CUDA.DP.OPMD.QED
#EXE=./main3d.gnu.TPROF.MPI.CUDA.ex
INPUTS=inputs_small

# CUDA visible devices are ordered inverse to local task IDs
srun /bin/bash -l -c "  \
    export CUDA_VISIBLE_DEVICES=$((3-SLURM_LOCALID));
    ${EXE} ${INPUTS} \
      amrex.the_arena_is_managed=0 \
      amrex.use_gpu_aware_mpi=1"   \
  > output.txt

To run a simulation, copy the lines above to a file perlmutter.sbatch and run

sbatch perlmutter.sbatch

to submit the job.

Post-Processing¶

For post-processing, most users use Python via NERSC’s Jupyter service (Docs).

Please follow the same process as for NERSC Cori post-processing. Important: The environment + Jupyter kernel must separate from the one you create for Cori.

The Perlmutter $PSCRATCH filesystem is currently not yet available on Jupyter. Thus, store or copy your data to Cori’s $SCRATCH or use the Community FileSystem (CFS) for now.