Lassen (LLNL)
The Lassen V100 GPU cluster is located at LLNL.
Introduction
If you are new to this system, please see the following resources:
LLNL user account (login required)
Batch system: LSF
Jupyter service (documentation, login required)
-
/p/gpfs1/$(whoami)
: personal directory on the parallel filesystemNote that the
$HOME
directory and the/usr/workspace/$(whoami)
space are NFS mounted and not suitable for production quality data generation.
Login
Lassen is currently transitioning to RHEL8. During this transition, first SSH into lassen and then to the updated RHEL8/TOSS4 nodes.
ssh lassen.llnl.gov
ssh eatoss4
Approximately October/November 2023, the new software environment on these nodes will be the new default.
ssh lassen.llnl.gov
Approximately October/November 2023, this partition will become TOSS4 (RHEL8) as well.
Preparation
Use the following commands to download the WarpX source code:
git clone https://github.com/ECP-WarpX/WarpX.git /usr/workspace/${USER}/lassen/src/warpx
We use system software modules, add environment hints and further dependencies via the file $HOME/lassen_v100_warpx.profile
.
Create it now:
cp /usr/workspace/${USER}/lassen/src/warpx/Tools/machines/lassen-llnl/lassen_v100_warpx.profile.example $HOME/lassen_v100_warpx.profile
Edit the 2nd line of this script, which sets the export proj=""
variable.
For example, if you are member of the project nsldt
, then run vi $HOME/lassen_v100_warpx.profile
.
Enter the edit mode by typing i
and edit line 2 to read:
export proj="nsldt"
Exit the vi
editor with Esc
and then type :wq
(write & quit).
Important
Now, and as the first step on future logins to lassen, activate these environment settings:
source $HOME/lassen_v100_warpx.profile
We use system software modules, add environment hints and further dependencies via the file $HOME/lassen_v100_warpx_toss3.profile
.
Create it now:
cp /usr/workspace/${USER}/lassen/src/warpx/Tools/machines/lassen-llnl/lassen_v100_warpx_toss3.profile.example $HOME/lassen_v100_warpx_toss3.profile
Edit the 2nd line of this script, which sets the export proj=""
variable.
For example, if you are member of the project nsldt
, then run vi $HOME/lassen_v100_warpx_toss3.profile
.
Enter the edit mode by typing i
and edit line 2 to read:
export proj="nsldt"
Exit the vi
editor with Esc
and then type :wq
(write & quit).
Important
Now, and as the first step on future logins to lassen, activate these environment settings:
source $HOME/lassen_v100_warpx_toss3.profile
Finally, since lassen does not yet provide software modules for some of our dependencies, install them once:
bash /usr/workspace/${USER}/lassen/src/warpx/Tools/machines/lassen-llnl/install_v100_dependencies.sh
source /usr/workspace/${USER}/lassen/gpu/venvs/warpx-lassen/bin/activate
bash /usr/workspace/${USER}/lassen/src/warpx/Tools/machines/lassen-llnl/install_v100_dependencies_toss3.sh
source /usr/workspace/${USER}/lassen-toss3/gpu/venvs/warpx-lassen-toss3/bin/activate
Compilation
Use the following cmake commands to compile the application executable:
cd /usr/workspace/${USER}/lassen/src/warpx
rm -rf build_lassen
cmake -S . -B build_lassen -DWarpX_COMPUTE=CUDA -DWarpX_PSATD=ON -DWarpX_QED_TABLE_GEN=ON -DWarpX_DIMS="1;2;RZ;3"
cmake --build build_lassen -j 8
The WarpX application executables are now in /usr/workspace/${USER}/lassen/src/warpx/build_lassen/bin/
.
Additionally, the following commands will install WarpX as a Python module:
rm -rf build_lassen_py
cmake -S . -B build_lassen_py -DWarpX_COMPUTE=CUDA -DWarpX_PSATD=ON -DWarpX_QED_TABLE_GEN=ON -DWarpX_APP=OFF -DWarpX_PYTHON=ON -DWarpX_DIMS="1;2;RZ;3"
cmake --build build_lassen_py -j 8 --target pip_install
Now, you can submit lassen compute jobs for WarpX Python (PICMI) scripts (example scripts).
Or, you can use the WarpX executables to submit lassen jobs (example inputs).
For executables, you can reference their location in your job script or copy them to a location in $PROJWORK/$proj/
.
Update WarpX & Dependencies
If you already installed WarpX in the past and want to update it, start by getting the latest source code:
cd /usr/workspace/${USER}/lassen/src/warpx
# read the output of this command - does it look ok?
git status
# get the latest WarpX source code
git fetch
git pull
# read the output of these commands - do they look ok?
git status
git log # press q to exit
And, if needed,
log out and into the system, activate the now updated environment profile as usual,
As a last step, clean the build directory rm -rf /usr/workspace/${USER}/lassen/src/warpx/build_lassen
and rebuild WarpX.
Running
V100 GPUs (16GB)
The batch script below can be used to run a WarpX simulation on 2 nodes on the supercomputer Lassen at LLNL.
Replace descriptions between chevrons <>
by relevant values, for instance <input file>
could be plasma_mirror_inputs
.
Note that the only option so far is to run with one MPI rank per GPU.
#!/bin/bash
# Copyright 2020-2023 Axel Huebl
#
# This file is part of WarpX.
#
# License: BSD-3-Clause-LBNL
#
# Refs.:
# https://jsrunvisualizer.olcf.ornl.gov/?s4f0o11n6c7g1r11d1b1l0=
# https://hpc.llnl.gov/training/tutorials/using-lcs-sierra-system#quick16
#BSUB -G <allocation ID>
#BSUB -W 00:10
#BSUB -nnodes 2
#BSUB -alloc_flags smt4
#BSUB -J WarpX
#BSUB -o WarpXo.%J
#BSUB -e WarpXe.%J
# Work-around OpenMPI bug with chunked HDF5
# https://github.com/open-mpi/ompi/issues/7795
export OMPI_MCA_io=ompio
# Work-around for broken IBM "libcollectives" MPI_Allgatherv
# https://github.com/ECP-WarpX/WarpX/pull/2874
export OMPI_MCA_coll_ibm_skip_allgatherv=true
# ROMIO has a hint for GPFS named IBM_largeblock_io which optimizes I/O with operations on large blocks
export IBM_largeblock_io=true
# MPI-I/O: ROMIO hints for parallel HDF5 performance
export ROMIO_HINTS=./romio-hints
# number of hosts: unique node names minus batch node
NUM_HOSTS=$(( $(echo $LSB_HOSTS | tr ' ' '\n' | uniq | wc -l) - 1 ))
cat > romio-hints << EOL
romio_cb_write enable
romio_ds_write enable
cb_buffer_size 16777216
cb_nodes ${NUM_HOSTS}
EOL
# OpenMPI file locks are slow and not needed
# https://github.com/open-mpi/ompi/issues/10053
export OMPI_MCA_sharedfp=^lockedfile,individual
# HDF5: disable slow locks (promise not to open half-written files)
export HDF5_USE_FILE_LOCKING=FALSE
# OpenMP: 1 thread per MPI rank
export OMP_NUM_THREADS=1
# store out task host mapping: helps identify broken nodes at scale
jsrun -r 4 -a1 -g 1 -c 7 -e prepended hostname > task_host_mapping.txt
# run WarpX
jsrun -r 4 -a 1 -g 1 -c 7 -l GPU-CPU -d packed -b rs -e prepended -M "-gpu" <path/to/executable> <input file> > output.txt
To run a simulation, copy the lines above to a file lassen_v100.bsub
and run
bsub lassen_v100.bsub
to submit the job.
For a 3D simulation with a few (1-4) particles per cell using FDTD Maxwell solver on V100 GPUs for a well load-balanced problem (in our case laser wakefield acceleration simulation in a boosted frame in the quasi-linear regime), the following set of parameters provided good performance:
amr.max_grid_size=256
andamr.blocking_factor=128
.One MPI rank per GPU (e.g., 4 MPI ranks for the 4 GPUs on each Lassen node)
Two `128x128x128` grids per GPU, or one `128x128x256` grid per GPU.
Known System Issues
Warning
Feb 17th, 2022 (INC0278922):
The implementation of AllGatherv
in IBM’s MPI optimization library “libcollectives” is broken and leads to HDF5 crashes for multi-node runs.
Our batch script templates above apply this work-around before the call to jsrun
, which avoids the broken routines from IBM and trades them for an OpenMPI implementation of collectives:
export OMPI_MCA_coll_ibm_skip_allgatherv=true
As part of the same CORAL acquisition program, Lassen is very similar to the design of Summit (OLCF). Thus, when encountering new issues it is worth checking also the known Summit issues and work-arounds.