Perlmutter (NERSC)

Warning

Perlmutter is still in acceptance testing. This page documents our internal testing workflow only.

The Perlmutter cluster is located at NERSC.

If you are new to this system, please see the following resources:

  • NERSC user guide

  • Batch system: Slurm

  • Jupyter service

  • Production directories:

    • $PSCRATCH: per-user production directory (<TBD>TB)

    • /global/cscratch1/sd/m3239: shared production directory for users in the project m3239 (50TB)

    • /global/cfs/cdirs/m3239/: community file system for users in the project m3239 (100TB)

Installation

Use the following commands to download the WarpX source code and switch to the correct branch:

git clone https://github.com/ECP-WarpX/WarpX.git $HOME/src/warpx

We use the following modules and environments on the system ($HOME/perlmutter_warpx.profile).

# please set your project account
export proj=<yourProject>

# required dependencies
module load cmake/git-20210830  # 3.22-dev
module swap PrgEnv-nvidia PrgEnv-gnu
module swap gcc gcc/9.3.0
module load cuda

# optional: just an additional text editor
# module load nano  # TODO: request from support

# optional: for openPMD support
module load cray-hdf5-parallel/1.12.0.6
export CMAKE_PREFIX_PATH=$HOME/sw/perlmutter/adios2-2.7.1:$CMAKE_PREFIX_PATH

# optional: Python, ...
# TODO

# GPU-aware MPI
export MPICH_GPU_SUPPORT_ENABLED=1

# optional: an alias to request an interactive node for two hours
function getNode() {
    salloc -N 1 --ntasks-per-node=4 -t 2:00:00 -C gpu -c 32 -G 4 -A $proj
}

# optimize CUDA compilation for A100
export AMREX_CUDA_ARCH=8.0

# compiler environment hints
export CC=$(which gcc)
export CXX=$(which g++)
export FC=$(which gfortran)
export CUDACXX=$(which nvcc)
export CUDAHOSTCXX=$(which g++)

We recommend to store the above lines in a file, such as $HOME/perlmutter_warpx.profile, and load it into your shell after a login:

source $HOME/perlmutter_warpx.profile

And since Perlmutter does not yet provide a module for it, install ADIOS2:

git clone -b v2.7.1 https://github.com/ornladios/ADIOS2.git src/adios2
cmake -S src/adios2 -B src/adios2-build -DADIOS2_USE_Fortran=OFF -DADIOS2_USE_Python=OFF -DCMAKE_INSTALL_PREFIX=$HOME/sw/perlmutter/adios2-2.7.1
cmake --build src/adios2-build --target install -j 32

Then, cd into the directory $HOME/src/warpx and use the following commands to compile:

cd $HOME/src/warpx
rm -rf build

cmake -S . -B build -DWarpX_OPENPMD=ON -DWarpX_DIMS=3 -DWarpX_COMPUTE=CUDA
cmake --build build -j 32

The general cmake compile-time options apply as usual.

Running

A100 GPUs

The batch script below can be used to run a WarpX simulation on multiple nodes (change -N accordingly) on the supercomputer Perlmutter at NERSC. Replace descriptions between chevrons <> by relevant values, for instance <input file> could be plasma_mirror_inputs. Note that we run one MPI rank per GPU.

#!/bin/bash -l

# Copyright 2021 Axel Huebl, Kevin Gott
#
# This file is part of WarpX.
#
# License: BSD-3-Clause-LBNL

#SBATCH -t 01:00:00
#SBATCH -N 4
#SBATCH -J WarpX
#SBATCH -A <proj>
#SBATCH -C gpu
#SBATCH -c 32
#SBATCH --ntasks-per-node=4
#SBATCH --gpus-per-task=1
#SBATCH --gpu-bind=single:1
#SBATCH -o WarpX.o%j
#SBATCH -e WarpX.e%j

# ============
# -N =                 nodes
# -n =                 tasks (MPI ranks, usually = G)
# -G =                 GPUs (full Perlmutter node, 4)
# -c =                 CPU per task (128 total threads on CPU, 32 per GPU)
#
# --ntasks-per-node=   number of tasks (MPI ranks) per node (full node, 4)
# --gpus-per-task=     number of GPUs per task (MPI rank) (full node, 4)
# --gpus-per-node=     number of GPUs per node (full node, 4)
#
# --gpu-bind=single:1  sets only one GPU to be visible to each MPI rank
#                         (quiets AMReX init warnings)
#
# Recommend using --ntasks-per-node=4, --gpus-per-task=1 and --gpu-bind=single:1,
# as they are fixed values and allow for easy scaling with less adjustments.
#
# ============

EXE=./warpx
#EXE=../WarpX/build/bin/warpx.3d.MPI.CUDA.DP.OPMD.QED
#EXE=./main3d.gnu.TPROF.MPI.CUDA.ex
INPUTS=inputs_small

srun ${EXE} ${INPUTS} \
  > output.txt

To run a simulation, copy the lines above to a file batch_perlmutter.sh and run

bsub batch_perlmutter.sh

to submit the job.