Lassen (LLNL)
The Lassen V100 GPU cluster is located at LLNL.
Introduction
If you are new to this system, please see the following resources:
Preparation
Use the following commands to download the WarpX source code:
git clone https://github.com/ECP-WarpX/WarpX.git /usr/workspace/${USER}/lassen/src/warpx
We use system software modules, add environment hints and further dependencies via the file $HOME/lassen_v100_warpx_toss3.profile
.
Create it now:
cp /usr/workspace/${USER}/lassen/src/warpx/Tools/machines/lassen-llnl/lassen_v100_warpx_toss3.profile.example $HOME/lassen_v100_warpx_toss3.profile
# please set your project account
#export proj="<yourProjectNameHere>" # edit this and comment in
# required dependencies
module load cmake/3.29.2
module load gcc/11.2.1
module load cuda/12.0.0
# optional: for QED lookup table generation support
module load boost/1.70.0
# optional: for openPMD support
SRC_DIR="/usr/workspace/${USER}/lassen/src"
SW_DIR="/usr/workspace/${USER}/lassen-toss3/gpu"
export CMAKE_PREFIX_PATH=${SW_DIR}/c-blosc-1.21.1:$CMAKE_PREFIX_PATH
export CMAKE_PREFIX_PATH=${SW_DIR}/hdf5-1.14.1.2:$CMAKE_PREFIX_PATH
export CMAKE_PREFIX_PATH=${SW_DIR}/adios2-2.8.3:$CMAKE_PREFIX_PATH
export LD_LIBRARY_PATH=${SW_DIR}/c-blosc-1.21.1/lib64:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=${SW_DIR}/hdf5-1.14.1.2/lib64:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=${SW_DIR}/adios2-2.8.3/lib64:$LD_LIBRARY_PATH
export PATH=${SW_DIR}/hdf5-1.14.1.2/bin:${PATH}
export PATH=${SW_DIR}/adios2-2.8.3/bin:${PATH}
# optional: for PSATD in RZ geometry support
export CMAKE_PREFIX_PATH=${SW_DIR}/blaspp-2024.05.31:$CMAKE_PREFIX_PATH
export CMAKE_PREFIX_PATH=${SW_DIR}/lapackpp-2024.05.31:$CMAKE_PREFIX_PATH
export LD_LIBRARY_PATH=${SW_DIR}/blaspp-2024.05.31/lib64:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=${SW_DIR}/lapackpp-2024.05.31/lib64:$LD_LIBRARY_PATH
# optional: for Python bindings
module load python/3.11.5
if [ -d "${SW_DIR}/venvs/warpx-lassen-toss3" ]
then
source ${SW_DIR}/venvs/warpx-lassen-toss3/bin/activate
fi
# optional: an alias to request an interactive node for two hours
alias getNode="bsub -G $proj -W 2:00 -nnodes 1 -Is /bin/bash"
# an alias to run a command on a batch node for up to 30min
# usage: runNode <command>
alias runNode="bsub -q debug -P $proj -W 2:00 -nnodes 1 -I"
# fix system defaults: do not escape $ with a \ on tab completion
shopt -s direxpand
# optimize CUDA compilation for V100
export AMREX_CUDA_ARCH=7.0
export CUDAARCHS=70
# compiler environment hints
export CC=$(which gcc)
export CXX=$(which g++)
export FC=$(which gfortran)
export CUDACXX=$(which nvcc)
export CUDAHOSTCXX=${CXX}
Edit the 2nd line of this script, which sets the export proj=""
variable.
For example, if you are member of the project nsldt
, then run vi $HOME/lassen_v100_warpx_toss3.profile
.
Enter the edit mode by typing i
and edit line 2 to read:
Exit the vi
editor with Esc
and then type :wq
(write & quit).
Important
Now, and as the first step on future logins to lassen, activate these environment settings:
source $HOME/lassen_v100_warpx_toss3.profile
Finally, since lassen does not yet provide software modules for some of our dependencies, install them once:
bash /usr/workspace/${USER}/lassen/src/warpx/Tools/machines/lassen-llnl/install_v100_dependencies_toss3.sh
source /usr/workspace/${USER}/lassen-toss3/gpu/venvs/warpx-lassen-toss3/bin/activate
#!/bin/bash
#
# Copyright 2023 The WarpX Community
#
# This file is part of WarpX.
#
# Author: Axel Huebl
# License: BSD-3-Clause-LBNL
# Exit on first error encountered #############################################
#
set -eu -o pipefail
# Check: ######################################################################
#
# Was lassen_v100_warpx.profile sourced and configured correctly?
if [ -z ${proj-} ]; then echo "WARNING: The 'proj' variable is not yet set in your lassen_v100_warpx_toss3.profile file! Please edit its line 2 to continue!"; exit 1; fi
# Remove old dependencies #####################################################
#
SRC_DIR="/usr/workspace/${USER}/lassen-toss3/src"
SW_DIR="/usr/workspace/${USER}/lassen-toss3/gpu"
rm -rf ${SW_DIR}
mkdir -p ${SW_DIR}
mkdir -p ${SRC_DIR}
# remove common user mistakes in python, located in .local instead of a venv
python3 -m pip uninstall -qq -y pywarpx
python3 -m pip uninstall -qq -y warpx
python3 -m pip uninstall -qqq -y mpi4py 2>/dev/null || true
# General extra dependencies ##################################################
#
# tmpfs build directory: avoids issues often seen with $HOME and is faster
build_dir=$(mktemp -d)
# c-blosc (I/O compression)
if [ -d ${SRC_DIR}/c-blosc ]
then
cd ${SRC_DIR}/c-blosc
git fetch --prune
git checkout v1.21.1
cd -
else
git clone -b v1.21.1 https://github.com/Blosc/c-blosc.git ${SRC_DIR}/c-blosc
fi
cmake -S ${SRC_DIR}/c-blosc -B ${build_dir}/c-blosc-lassen-build -DBUILD_TESTS=OFF -DBUILD_BENCHMARKS=OFF -DDEACTIVATE_AVX2=OFF -DCMAKE_INSTALL_PREFIX=${SW_DIR}/c-blosc-1.21.1
cmake --build ${build_dir}/c-blosc-lassen-build --target install --parallel 10
# HDF5
if [ -d ${SRC_DIR}/hdf5 ]
then
cd ${SRC_DIR}/hdf5
git fetch --prune
git checkout hdf5-1_14_1-2
cd -
else
git clone -b hdf5-1_14_1-2 https://github.com/HDFGroup/hdf5.git ${SRC_DIR}/hdf5
fi
cmake -S ${SRC_DIR}/hdf5 -B ${build_dir}/hdf5-lassen-build -DBUILD_TESTING=OFF -DHDF5_ENABLE_PARALLEL=ON -DCMAKE_INSTALL_PREFIX=${SW_DIR}/hdf5-1.14.1.2
cmake --build ${build_dir}/hdf5-lassen-build --target install --parallel 10
# ADIOS2
if [ -d ${SRC_DIR}/adios2 ]
then
cd ${SRC_DIR}/adios2
git fetch --prune
git checkout v2.8.3
cd -
else
git clone -b v2.8.3 https://github.com/ornladios/ADIOS2.git ${SRC_DIR}/adios2
fi
cmake -S ${SRC_DIR}/adios2 -B ${build_dir}/adios2-lassen-build -DBUILD_TESTING=OFF -DADIOS2_BUILD_EXAMPLES=OFF -DADIOS2_USE_Blosc=ON -DADIOS2_USE_Fortran=OFF -DADIOS2_USE_Python=OFF -DADIOS2_USE_SST=OFF -DADIOS2_USE_ZeroMQ=OFF -DCMAKE_INSTALL_PREFIX=${SW_DIR}/adios2-2.8.3
cmake --build ${build_dir}/adios2-lassen-build --target install -j 10
# BLAS++ (for PSATD+RZ)
if [ -d ${SRC_DIR}/blaspp ]
then
cd ${SRC_DIR}/blaspp
git fetch --prune
git checkout v2024.05.31
cd -
else
git clone -b v2024.05.31 https://github.com/icl-utk-edu/blaspp.git ${SRC_DIR}/blaspp
fi
cmake -S ${SRC_DIR}/blaspp -B ${build_dir}/blaspp-lassen-build -Duse_openmp=ON -Dgpu_backend=cuda -Duse_cmake_find_blas=ON -DCMAKE_CXX_STANDARD=17 -DCMAKE_INSTALL_PREFIX=${SW_DIR}/blaspp-2024.05.31
cmake --build ${build_dir}/blaspp-lassen-build --target install --parallel 10
# LAPACK++ (for PSATD+RZ)
if [ -d ${SRC_DIR}/lapackpp ]
then
cd ${SRC_DIR}/lapackpp
git fetch --prune
git checkout v2024.05.31
cd -
else
git clone -b v2024.05.31 https://github.com/icl-utk-edu/lapackpp.git ${SRC_DIR}/lapackpp
fi
CXXFLAGS="-DLAPACK_FORTRAN_ADD_" cmake -S ${SRC_DIR}/lapackpp -B ${build_dir}/lapackpp-lassen-build -Duse_cmake_find_lapack=ON -DCMAKE_CXX_STANDARD=17 -Dbuild_tests=OFF -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=ON -DCMAKE_INSTALL_PREFIX=${SW_DIR}/lapackpp-2024.05.31 -DLAPACK_LIBRARIES=/usr/lib64/liblapack.so
cmake --build ${build_dir}/lapackpp-lassen-build --target install --parallel 10
# Python ######################################################################
#
# sometimes, the Lassen PIP Index is down
export PIP_EXTRA_INDEX_URL="https://pypi.org/simple"
python3 -m pip install --upgrade --user virtualenv
rm -rf ${SW_DIR}/venvs/warpx-lassen-toss3
python3 -m venv ${SW_DIR}/venvs/warpx-lassen-toss3
source ${SW_DIR}/venvs/warpx-lassen-toss3/bin/activate
python3 -m pip install --upgrade pip
python3 -m pip cache purge
python3 -m pip install --upgrade build
python3 -m pip install --upgrade packaging
python3 -m pip install --upgrade wheel
python3 -m pip install --upgrade setuptools
python3 -m pip install --upgrade cython
python3 -m pip install --upgrade numpy
python3 -m pip install --upgrade pandas
CMAKE_PREFIX_PATH=/usr/lib64:${CMAKE_PREFIX_PATH} python3 -m pip install --upgrade -Ccompile-args="-j10" -Csetup-args=-Dblas=BLAS -Csetup-args=-Dlapack=BLAS scipy
python3 -m pip install --upgrade mpi4py --no-cache-dir --no-build-isolation --no-binary mpi4py
python3 -m pip install --upgrade openpmd-api
CC=mpicc H5PY_SETUP_REQUIRES=0 HDF5_DIR=${SW_DIR}/hdf5-1.14.1.2 HDF5_MPI=ON python3 -m pip install --upgrade h5py --no-cache-dir --no-build-isolation --no-binary h5py
MPLLOCALFREETYPE=1 python3 -m pip install --upgrade matplotlib==3.2.2 # does not try to build freetype itself
echo "matplotlib==3.2.2" > ${build_dir}/constraints.txt
python3 -m pip install --upgrade -c ${build_dir}/constraints.txt yt
# install or update WarpX dependencies such as picmistandard
python3 -m pip install --upgrade -r ${SRC_DIR}/warpx/requirements.txt
# for ML dependencies, see install_v100_ml.sh
# remove build temporary directory
rm -rf ${build_dir}
Compilation
Use the following cmake commands to compile the application executable:
cd /usr/workspace/${USER}/lassen/src/warpx
rm -rf build_lassen
cmake -S . -B build_lassen -DWarpX_COMPUTE=CUDA -DWarpX_FFT=ON -DWarpX_QED_TABLE_GEN=ON -DWarpX_DIMS="1;2;RZ;3"
cmake --build build_lassen -j 8
The WarpX application executables are now in /usr/workspace/${USER}/lassen/src/warpx/build_lassen/bin/
.
Additionally, the following commands will install WarpX as a Python module:
rm -rf build_lassen_py
cmake -S . -B build_lassen_py -DWarpX_COMPUTE=CUDA -DWarpX_FFT=ON -DWarpX_QED_TABLE_GEN=ON -DWarpX_APP=OFF -DWarpX_PYTHON=ON -DWarpX_DIMS="1;2;RZ;3"
cmake --build build_lassen_py -j 8 --target pip_install
Now, you can submit lassen compute jobs for WarpX Python (PICMI) scripts (example scripts).
Or, you can use the WarpX executables to submit lassen jobs (example inputs).
For executables, you can reference their location in your job script or copy them to a location in $PROJWORK/$proj/
.
Update WarpX & Dependencies
If you already installed WarpX in the past and want to update it, start by getting the latest source code:
cd /usr/workspace/${USER}/lassen/src/warpx
# read the output of this command - does it look ok?
git status
# get the latest WarpX source code
git fetch
git pull
# read the output of these commands - do they look ok?
git status
git log # press q to exit
And, if needed,
As a last step, clean the build directory rm -rf /usr/workspace/${USER}/lassen/src/warpx/build_lassen
and rebuild WarpX.
Running
V100 GPUs (16GB)
The batch script below can be used to run a WarpX simulation on 2 nodes on the supercomputer Lassen at LLNL.
Replace descriptions between chevrons <>
by relevant values, for instance <input file>
could be plasma_mirror_inputs
.
Note that the only option so far is to run with one MPI rank per GPU.
Listing 11 You can copy this file from Tools/machines/lassen-llnl/lassen_v100.bsub
.
#!/bin/bash
# Copyright 2020-2023 Axel Huebl
#
# This file is part of WarpX.
#
# License: BSD-3-Clause-LBNL
#
# Refs.:
# https://jsrunvisualizer.olcf.ornl.gov/?s4f0o11n6c7g1r11d1b1l0=
# https://hpc.llnl.gov/training/tutorials/using-lcs-sierra-system#quick16
#BSUB -G <allocation ID>
#BSUB -W 00:10
#BSUB -nnodes 2
#BSUB -alloc_flags smt4
#BSUB -J WarpX
#BSUB -o WarpXo.%J
#BSUB -e WarpXe.%J
# Work-around OpenMPI bug with chunked HDF5
# https://github.com/open-mpi/ompi/issues/7795
export OMPI_MCA_io=ompio
# Work-around for broken IBM "libcollectives" MPI_Allgatherv
# https://github.com/ECP-WarpX/WarpX/pull/2874
export OMPI_MCA_coll_ibm_skip_allgatherv=true
# ROMIO has a hint for GPFS named IBM_largeblock_io which optimizes I/O with operations on large blocks
export IBM_largeblock_io=true
# MPI-I/O: ROMIO hints for parallel HDF5 performance
export ROMIO_HINTS=./romio-hints
# number of hosts: unique node names minus batch node
NUM_HOSTS=$(( $(echo $LSB_HOSTS | tr ' ' '\n' | uniq | wc -l) - 1 ))
cat > romio-hints << EOL
romio_cb_write enable
romio_ds_write enable
cb_buffer_size 16777216
cb_nodes ${NUM_HOSTS}
EOL
# OpenMPI file locks are slow and not needed
# https://github.com/open-mpi/ompi/issues/10053
export OMPI_MCA_sharedfp=^lockedfile,individual
# HDF5: disable slow locks (promise not to open half-written files)
export HDF5_USE_FILE_LOCKING=FALSE
# OpenMP: 1 thread per MPI rank
export OMP_NUM_THREADS=1
# store out task host mapping: helps identify broken nodes at scale
jsrun -r 4 -a1 -g 1 -c 7 -e prepended hostname > task_host_mapping.txt
# run WarpX
jsrun -r 4 -a 1 -g 1 -c 7 -l GPU-CPU -d packed -b rs -e prepended -M "-gpu" <path/to/executable> <input file> > output.txt
To run a simulation, copy the lines above to a file lassen_v100.bsub
and run
to submit the job.
For a 3D simulation with a few (1-4) particles per cell using FDTD Maxwell
solver on V100 GPUs for a well load-balanced problem (in our case laser
wakefield acceleration simulation in a boosted frame in the quasi-linear
regime), the following set of parameters provided good performance:
amr.max_grid_size=256
and amr.blocking_factor=128
.
One MPI rank per GPU (e.g., 4 MPI ranks for the 4 GPUs on each Lassen
node)
Two `128x128x128` grids per GPU, or one `128x128x256` grid per GPU.
Known System Issues
Warning
Feb 17th, 2022 (INC0278922):
The implementation of AllGatherv
in IBM’s MPI optimization library “libcollectives” is broken and leads to HDF5 crashes for multi-node runs.
Our batch script templates above apply this work-around before the call to jsrun
, which avoids the broken routines from IBM and trades them for an OpenMPI implementation of collectives:
export OMPI_MCA_coll_ibm_skip_allgatherv=true
As part of the same CORAL acquisition program, Lassen is very similar to the design of Summit (OLCF).
Thus, when encountering new issues it is worth checking also the known Summit issues and work-arounds.