Lassen (LLNL)

The Lassen V100 GPU cluster is located at LLNL.

Introduction

If you are new to this system, please see the following resources:

LLNL user account (login required)
Lassen user guide
Batch system: LSF
Jupyter service (documentation, login required)
Production directories:
- /p/gpfs1/$(whoami): personal directory on the parallel filesystem
- Note that the $HOME directory and the /usr/workspace/$(whoami) space are NFS mounted and not suitable for production quality data generation.

Login

TOSS4 (RHEL8)

Lassen is currently transitioning to RHEL8. During this transition, first SSH into lassen and then to the updated RHEL8/TOSS4 nodes.

ssh lassen.llnl.gov
ssh eatoss4

Approximately October/November 2023, the new software environment on these nodes will be the new default.

TOSS3 (RHEL7)

ssh lassen.llnl.gov

Approximately October/November 2023, this partition will become TOSS4 (RHEL8) as well.

Preparation

Use the following commands to download the WarpX source code:

git clone https://github.com/ECP-WarpX/WarpX.git /usr/workspace/${USER}/lassen/src/warpx

TOSS4 (RHEL8)

We use system software modules, add environment hints and further dependencies via the file $HOME/lassen_v100_warpx.profile. Create it now:

cp /usr/workspace/${USER}/lassen/src/warpx/Tools/machines/lassen-llnl/lassen_v100_warpx.profile.example $HOME/lassen_v100_warpx.profile

Edit the 2nd line of this script, which sets the export proj="" variable. For example, if you are member of the project nsldt, then run vi $HOME/lassen_v100_warpx.profile. Enter the edit mode by typing i and edit line 2 to read:

export proj="nsldt"

Exit the vi editor with Esc and then type :wq (write & quit).

Important

Now, and as the first step on future logins to lassen, activate these environment settings:

source $HOME/lassen_v100_warpx.profile

TOSS3 (RHEL7)

We use system software modules, add environment hints and further dependencies via the file $HOME/lassen_v100_warpx_toss3.profile. Create it now:

cp /usr/workspace/${USER}/lassen/src/warpx/Tools/machines/lassen-llnl/lassen_v100_warpx_toss3.profile.example $HOME/lassen_v100_warpx_toss3.profile

Edit the 2nd line of this script, which sets the export proj="" variable. For example, if you are member of the project nsldt, then run vi $HOME/lassen_v100_warpx_toss3.profile. Enter the edit mode by typing i and edit line 2 to read:

export proj="nsldt"

Exit the vi editor with Esc and then type :wq (write & quit).

Important

Now, and as the first step on future logins to lassen, activate these environment settings:

source $HOME/lassen_v100_warpx_toss3.profile

Finally, since lassen does not yet provide software modules for some of our dependencies, install them once:

TOSS4 (RHEL8)

bash /usr/workspace/${USER}/lassen/src/warpx/Tools/machines/lassen-llnl/install_v100_dependencies.sh
source /usr/workspace/${USER}/lassen/gpu/venvs/warpx-lassen/bin/activate

Script Details

#!/bin/bash
#
# Copyright 2023 The WarpX Community
#
# This file is part of WarpX.
#
# Author: Axel Huebl
# License: BSD-3-Clause-LBNL

# Exit on first error encountered #############################################
#
set -eu -o pipefail


# Check: ######################################################################
#
#   Was lassen_v100_warpx.profile sourced and configured correctly?
if [ -z ${proj-} ]; then echo "WARNING: The 'proj' variable is not yet set in your lassen_v100_warpx.profile file! Please edit its line 2 to continue!"; exit 1; fi


# Remove old dependencies #####################################################
#
SRC_DIR="/usr/workspace/${USER}/lassen/src"
SW_DIR="/usr/workspace/${USER}/lassen/gpu"
rm -rf ${SW_DIR}
mkdir -p ${SW_DIR}
mkdir -p ${SRC_DIR}

# remove common user mistakes in python, located in .local instead of a venv
python3 -m pip uninstall -qq -y pywarpx
python3 -m pip uninstall -qq -y warpx
python3 -m pip uninstall -qqq -y mpi4py 2>/dev/null || true


# General extra dependencies ##################################################
#

# tmpfs build directory: avoids issues often seen with $HOME and is faster
build_dir=$(mktemp -d)

# c-blosc (I/O compression)
if [ -d ${SRC_DIR}/c-blosc ]
then
  cd ${SRC_DIR}/c-blosc
  git fetch --prune
  git checkout v1.21.1
  cd -
else
  git clone -b v1.21.1 https://github.com/Blosc/c-blosc.git ${SRC_DIR}/c-blosc
fi
cmake -S ${SRC_DIR}/c-blosc -B ${build_dir}/c-blosc-lassen-build -DBUILD_TESTS=OFF -DBUILD_BENCHMARKS=OFF -DDEACTIVATE_AVX2=OFF -DCMAKE_INSTALL_PREFIX=${SW_DIR}/c-blosc-1.21.1
cmake --build ${build_dir}/c-blosc-lassen-build --target install --parallel 10

# HDF5
if [ -d ${SRC_DIR}/hdf5 ]
then
  cd ${SRC_DIR}/hdf5
  git fetch --prune
  git checkout hdf5-1_14_1-2
  cd -
else
  git clone -b hdf5-1_14_1-2 https://github.com/HDFGroup/hdf5.git ${SRC_DIR}/hdf5
fi
cmake -S ${SRC_DIR}/hdf5 -B ${build_dir}/hdf5-lassen-build -DBUILD_TESTING=OFF -DHDF5_ENABLE_PARALLEL=ON -DCMAKE_INSTALL_PREFIX=${SW_DIR}/hdf5-1.14.1.2
cmake --build ${build_dir}/hdf5-lassen-build --target install --parallel 10

# ADIOS2
if [ -d ${SRC_DIR}/adios2 ]
then
  cd ${SRC_DIR}/adios2
  git fetch --prune
  git checkout v2.8.3
  cd -
else
  git clone -b v2.8.3 https://github.com/ornladios/ADIOS2.git ${SRC_DIR}/adios2
fi
cmake -S ${SRC_DIR}/adios2 -B ${build_dir}/adios2-lassen-build -DBUILD_TESTING=OFF -DADIOS2_BUILD_EXAMPLES=OFF -DADIOS2_USE_Blosc=ON -DADIOS2_USE_Fortran=OFF -DADIOS2_USE_Python=OFF -DADIOS2_USE_SST=OFF -DADIOS2_USE_ZeroMQ=OFF -DCMAKE_INSTALL_PREFIX=${SW_DIR}/adios2-2.8.3
cmake --build ${build_dir}/adios2-lassen-build --target install -j 10

# BLAS++ (for PSATD+RZ)
if [ -d ${SRC_DIR}/blaspp ]
then
  cd ${SRC_DIR}/blaspp
  git fetch --prune
  git checkout master
  git pull
  cd -
else
  git clone https://github.com/icl-utk-edu/blaspp.git ${SRC_DIR}/blaspp
fi
cmake -S ${SRC_DIR}/blaspp -B ${build_dir}/blaspp-lassen-build -Duse_openmp=ON -Dgpu_backend=cuda -Duse_cmake_find_blas=ON -DCMAKE_CXX_STANDARD=17 -DCMAKE_INSTALL_PREFIX=${SW_DIR}/blaspp-master
cmake --build ${build_dir}/blaspp-lassen-build --target install --parallel 10

# LAPACK++ (for PSATD+RZ)
if [ -d ${SRC_DIR}/lapackpp ]
then
  cd ${SRC_DIR}/lapackpp
  git fetch --prune
  git checkout master
  git pull
  cd -
else
  git clone https://github.com/icl-utk-edu/lapackpp.git ${SRC_DIR}/lapackpp
fi
CXXFLAGS="-DLAPACK_FORTRAN_ADD_" cmake -S ${SRC_DIR}/lapackpp -B ${build_dir}/lapackpp-lassen-build -Duse_cmake_find_lapack=ON -DCMAKE_CXX_STANDARD=17 -Dbuild_tests=OFF -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=ON -DCMAKE_INSTALL_PREFIX=${SW_DIR}/lapackpp-master -DLAPACK_LIBRARIES=/usr/lib64/liblapack.so
cmake --build ${build_dir}/lapackpp-lassen-build --target install --parallel 10


# Python ######################################################################
#
# sometimes, the Lassen PIP Index is down
export PIP_EXTRA_INDEX_URL="https://pypi.org/simple"

python3 -m pip install --upgrade --user virtualenv
rm -rf ${SW_DIR}/venvs/warpx-lassen
python3 -m venv ${SW_DIR}/venvs/warpx-lassen
source ${SW_DIR}/venvs/warpx-lassen/bin/activate
python3 -m pip install --upgrade pip
python3 -m pip cache purge
python3 -m pip install --upgrade build
python3 -m pip install --upgrade packaging
python3 -m pip install --upgrade wheel
python3 -m pip install --upgrade setuptools
# Older version for h4py
# https://github.com/h5py/h5py/issues/2268
python3 -m pip install --upgrade "cython<3"
python3 -m pip install --upgrade numpy
python3 -m pip install --upgrade pandas
python3 -m pip install --upgrade -Ccompile-args="-j10" scipy
python3 -m pip install --upgrade mpi4py --no-cache-dir --no-build-isolation --no-binary mpi4py
python3 -m pip install --upgrade openpmd-api
CC=mpicc H5PY_SETUP_REQUIRES=0 HDF5_DIR=${SW_DIR}/hdf5-1.14.1.2 HDF5_MPI=ON python3 -m pip install --upgrade h5py --no-cache-dir --no-build-isolation --no-binary h5py
MPLLOCALFREETYPE=1 python3 -m pip install --upgrade matplotlib==3.2.2  # does not try to build freetype itself
echo "matplotlib==3.2.2" > ${build_dir}/constraints.txt
python3 -m pip install --upgrade -c ${build_dir}/constraints.txt yt

# install or update WarpX dependencies such as picmistandard
python3 -m pip install --upgrade -r ${SRC_DIR}/warpx/requirements.txt

# for ML dependencies, see install_v100_ml.sh


# remove build temporary directory
rm -rf ${build_dir}

AI/ML Dependencies (Optional)

If you plan to run AI/ML workflows depending on pyTorch, run the next step as well. This will take a while and should be skipped if not needed.

runNode bash /usr/workspace/${USER}/lassen/src/warpx/Tools/machines/lassen-llnl/install_v100_ml.sh

For optimas dependencies (incl. scikit-learn), plan another hour of build time:

python3 -m pip install -r /usr/workspace/${USER}/lassen/src/warpx/Tools/optimas/requirements.txt

TOSS3 (RHEL7)

bash /usr/workspace/${USER}/lassen/src/warpx/Tools/machines/lassen-llnl/install_v100_dependencies_toss3.sh
source /usr/workspace/${USER}/lassen-toss3/gpu/venvs/warpx-lassen-toss3/bin/activate

Script Details

#!/bin/bash
#
# Copyright 2023 The WarpX Community
#
# This file is part of WarpX.
#
# Author: Axel Huebl
# License: BSD-3-Clause-LBNL

# Exit on first error encountered #############################################
#
set -eu -o pipefail


# Check: ######################################################################
#
#   Was lassen_v100_warpx.profile sourced and configured correctly?
if [ -z ${proj-} ]; then echo "WARNING: The 'proj' variable is not yet set in your lassen_v100_warpx_toss3.profile file! Please edit its line 2 to continue!"; exit 1; fi


# Remove old dependencies #####################################################
#
SRC_DIR="/usr/workspace/${USER}/lassen-toss3/src"
SW_DIR="/usr/workspace/${USER}/lassen-toss3/gpu"
rm -rf ${SW_DIR}
mkdir -p ${SW_DIR}
mkdir -p ${SRC_DIR}

# remove common user mistakes in python, located in .local instead of a venv
python3 -m pip uninstall -qq -y pywarpx
python3 -m pip uninstall -qq -y warpx
python3 -m pip uninstall -qqq -y mpi4py 2>/dev/null || true


# General extra dependencies ##################################################
#

# tmpfs build directory: avoids issues often seen with $HOME and is faster
build_dir=$(mktemp -d)

# c-blosc (I/O compression)
if [ -d ${SRC_DIR}/c-blosc ]
then
  cd ${SRC_DIR}/c-blosc
  git fetch --prune
  git checkout v1.21.1
  cd -
else
  git clone -b v1.21.1 https://github.com/Blosc/c-blosc.git ${SRC_DIR}/c-blosc
fi
cmake -S ${SRC_DIR}/c-blosc -B ${build_dir}/c-blosc-lassen-build -DBUILD_TESTS=OFF -DBUILD_BENCHMARKS=OFF -DDEACTIVATE_AVX2=OFF -DCMAKE_INSTALL_PREFIX=${SW_DIR}/c-blosc-1.21.1
cmake --build ${build_dir}/c-blosc-lassen-build --target install --parallel 10

# HDF5
if [ -d ${SRC_DIR}/hdf5 ]
then
  cd ${SRC_DIR}/hdf5
  git fetch --prune
  git checkout hdf5-1_14_1-2
  cd -
else
  git clone -b hdf5-1_14_1-2 https://github.com/HDFGroup/hdf5.git ${SRC_DIR}/hdf5
fi
cmake -S ${SRC_DIR}/hdf5 -B ${build_dir}/hdf5-lassen-build -DBUILD_TESTING=OFF -DHDF5_ENABLE_PARALLEL=ON -DCMAKE_INSTALL_PREFIX=${SW_DIR}/hdf5-1.14.1.2
cmake --build ${build_dir}/hdf5-lassen-build --target install --parallel 10

# ADIOS2
if [ -d ${SRC_DIR}/adios2 ]
then
  cd ${SRC_DIR}/adios2
  git fetch --prune
  git checkout v2.8.3
  cd -
else
  git clone -b v2.8.3 https://github.com/ornladios/ADIOS2.git ${SRC_DIR}/adios2
fi
cmake -S ${SRC_DIR}/adios2 -B ${build_dir}/adios2-lassen-build -DBUILD_TESTING=OFF -DADIOS2_BUILD_EXAMPLES=OFF -DADIOS2_USE_Blosc=ON -DADIOS2_USE_Fortran=OFF -DADIOS2_USE_Python=OFF -DADIOS2_USE_SST=OFF -DADIOS2_USE_ZeroMQ=OFF -DCMAKE_INSTALL_PREFIX=${SW_DIR}/adios2-2.8.3
cmake --build ${build_dir}/adios2-lassen-build --target install -j 10

# BLAS++ (for PSATD+RZ)
if [ -d ${SRC_DIR}/blaspp ]
then
  cd ${SRC_DIR}/blaspp
  git fetch --prune
  git checkout master
  git pull
  cd -
else
  git clone https://github.com/icl-utk-edu/blaspp.git ${SRC_DIR}/blaspp
fi
cmake -S ${SRC_DIR}/blaspp -B ${build_dir}/blaspp-lassen-build -Duse_openmp=ON -Dgpu_backend=cuda -Duse_cmake_find_blas=ON -DCMAKE_CXX_STANDARD=17 -DCMAKE_INSTALL_PREFIX=${SW_DIR}/blaspp-master
cmake --build ${build_dir}/blaspp-lassen-build --target install --parallel 10

# LAPACK++ (for PSATD+RZ)
if [ -d ${SRC_DIR}/lapackpp ]
then
  cd ${SRC_DIR}/lapackpp
  git fetch --prune
  git checkout master
  git pull
  cd -
else
  git clone https://github.com/icl-utk-edu/lapackpp.git ${SRC_DIR}/lapackpp
fi
CXXFLAGS="-DLAPACK_FORTRAN_ADD_" cmake -S ${SRC_DIR}/lapackpp -B ${build_dir}/lapackpp-lassen-build -Duse_cmake_find_lapack=ON -DCMAKE_CXX_STANDARD=17 -Dbuild_tests=OFF -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=ON -DCMAKE_INSTALL_PREFIX=${SW_DIR}/lapackpp-master -DLAPACK_LIBRARIES=/usr/lib64/liblapack.so
cmake --build ${build_dir}/lapackpp-lassen-build --target install --parallel 10


# Python ######################################################################
#
# sometimes, the Lassen PIP Index is down
export PIP_EXTRA_INDEX_URL="https://pypi.org/simple"

python3 -m pip install --upgrade --user virtualenv
rm -rf ${SW_DIR}/venvs/warpx-lassen-toss3
python3 -m venv ${SW_DIR}/venvs/warpx-lassen-toss3
source ${SW_DIR}/venvs/warpx-lassen-toss3/bin/activate
python3 -m pip install --upgrade pip
python3 -m pip cache purge
python3 -m pip install --upgrade build
python3 -m pip install --upgrade packaging
python3 -m pip install --upgrade wheel
python3 -m pip install --upgrade setuptools
# Older version for h4py
# https://github.com/h5py/h5py/issues/2268
python3 -m pip install --upgrade "cython<3"
python3 -m pip install --upgrade numpy
python3 -m pip install --upgrade pandas
CMAKE_PREFIX_PATH=/usr/lib64:${CMAKE_PREFIX_PATH} python3 -m pip install --upgrade -Ccompile-args="-j10" -Csetup-args=-Dblas=BLAS -Csetup-args=-Dlapack=BLAS scipy
python3 -m pip install --upgrade mpi4py --no-cache-dir --no-build-isolation --no-binary mpi4py
python3 -m pip install --upgrade openpmd-api
CC=mpicc H5PY_SETUP_REQUIRES=0 HDF5_DIR=${SW_DIR}/hdf5-1.14.1.2 HDF5_MPI=ON python3 -m pip install --upgrade h5py --no-cache-dir --no-build-isolation --no-binary h5py
MPLLOCALFREETYPE=1 python3 -m pip install --upgrade matplotlib==3.2.2  # does not try to build freetype itself
echo "matplotlib==3.2.2" > ${build_dir}/constraints.txt
python3 -m pip install --upgrade -c ${build_dir}/constraints.txt yt

# install or update WarpX dependencies such as picmistandard
python3 -m pip install --upgrade -r ${SRC_DIR}/warpx/requirements.txt

# for ML dependencies, see install_v100_ml.sh


# remove build temporary directory
rm -rf ${build_dir}

Compilation

Use the following cmake commands to compile the application executable:

cd /usr/workspace/${USER}/lassen/src/warpx
rm -rf build_lassen

cmake -S . -B build_lassen -DWarpX_COMPUTE=CUDA -DWarpX_PSATD=ON -DWarpX_QED_TABLE_GEN=ON -DWarpX_DIMS="1;2;RZ;3"
cmake --build build_lassen -j 8

The WarpX application executables are now in /usr/workspace/${USER}/lassen/src/warpx/build_lassen/bin/. Additionally, the following commands will install WarpX as a Python module:

rm -rf build_lassen_py

cmake -S . -B build_lassen_py -DWarpX_COMPUTE=CUDA -DWarpX_PSATD=ON -DWarpX_QED_TABLE_GEN=ON -DWarpX_APP=OFF -DWarpX_PYTHON=ON -DWarpX_DIMS="1;2;RZ;3"
cmake --build build_lassen_py -j 8 --target pip_install

Now, you can submit lassen compute jobs for WarpX Python (PICMI) scripts (example scripts). Or, you can use the WarpX executables to submit lassen jobs (example inputs). For executables, you can reference their location in your job script or copy them to a location in $PROJWORK/$proj/.

Update WarpX & Dependencies

If you already installed WarpX in the past and want to update it, start by getting the latest source code:

cd /usr/workspace/${USER}/lassen/src/warpx

# read the output of this command - does it look ok?
git status

# get the latest WarpX source code
git fetch
git pull

# read the output of these commands - do they look ok?
git status
git log     # press q to exit

And, if needed,

update the lassen_v100_warpx.profile file,
log out and into the system, activate the now updated environment profile as usual,
execute the dependency install scripts.

As a last step, clean the build directory rm -rf /usr/workspace/${USER}/lassen/src/warpx/build_lassen and rebuild WarpX.

Running

V100 GPUs (16GB)

The batch script below can be used to run a WarpX simulation on 2 nodes on the supercomputer Lassen at LLNL. Replace descriptions between chevrons <> by relevant values, for instance <input file> could be plasma_mirror_inputs. Note that the only option so far is to run with one MPI rank per GPU.

Listing 10 You can copy this file from Tools/machines/lassen-llnl/lassen_v100.bsub.

#!/bin/bash

# Copyright 2020-2023 Axel Huebl
#
# This file is part of WarpX.
#
# License: BSD-3-Clause-LBNL
#
# Refs.:
#   https://jsrunvisualizer.olcf.ornl.gov/?s4f0o11n6c7g1r11d1b1l0=
#   https://hpc.llnl.gov/training/tutorials/using-lcs-sierra-system#quick16

#BSUB -G <allocation ID>
#BSUB -W 00:10
#BSUB -nnodes 2
#BSUB -alloc_flags smt4
#BSUB -J WarpX
#BSUB -o WarpXo.%J
#BSUB -e WarpXe.%J

# Work-around OpenMPI bug with chunked HDF5
#   https://github.com/open-mpi/ompi/issues/7795
export OMPI_MCA_io=ompio

# Work-around for broken IBM "libcollectives" MPI_Allgatherv
#   https://github.com/ECP-WarpX/WarpX/pull/2874
export OMPI_MCA_coll_ibm_skip_allgatherv=true

# ROMIO has a hint for GPFS named IBM_largeblock_io which optimizes I/O with operations on large blocks
export IBM_largeblock_io=true

# MPI-I/O: ROMIO hints for parallel HDF5 performance
export ROMIO_HINTS=./romio-hints
#   number of hosts: unique node names minus batch node
NUM_HOSTS=$(( $(echo $LSB_HOSTS | tr ' ' '\n' | uniq | wc -l) - 1 ))
cat > romio-hints << EOL
   romio_cb_write enable
   romio_ds_write enable
   cb_buffer_size 16777216
   cb_nodes ${NUM_HOSTS}
EOL

# OpenMPI file locks are slow and not needed
# https://github.com/open-mpi/ompi/issues/10053
export OMPI_MCA_sharedfp=^lockedfile,individual

# HDF5: disable slow locks (promise not to open half-written files)
export HDF5_USE_FILE_LOCKING=FALSE

# OpenMP: 1 thread per MPI rank
export OMP_NUM_THREADS=1

# store out task host mapping: helps identify broken nodes at scale
jsrun -r 4 -a1 -g 1 -c 7 -e prepended hostname > task_host_mapping.txt

# run WarpX
jsrun -r 4 -a 1 -g 1 -c 7 -l GPU-CPU -d packed -b rs -e prepended -M "-gpu" <path/to/executable> <input file> > output.txt

To run a simulation, copy the lines above to a file lassen_v100.bsub and run

bsub lassen_v100.bsub

to submit the job.

For a 3D simulation with a few (1-4) particles per cell using FDTD Maxwell solver on V100 GPUs for a well load-balanced problem (in our case laser wakefield acceleration simulation in a boosted frame in the quasi-linear regime), the following set of parameters provided good performance:

amr.max_grid_size=256 and amr.blocking_factor=128.
One MPI rank per GPU (e.g., 4 MPI ranks for the 4 GPUs on each Lassen node)
Two `128x128x128` grids per GPU, or one `128x128x256` grid per GPU.

Known System Issues

Warning

Feb 17th, 2022 (INC0278922): The implementation of AllGatherv in IBM’s MPI optimization library “libcollectives” is broken and leads to HDF5 crashes for multi-node runs.

Our batch script templates above apply this work-around before the call to jsrun, which avoids the broken routines from IBM and trades them for an OpenMPI implementation of collectives:

export OMPI_MCA_coll_ibm_skip_allgatherv=true

As part of the same CORAL acquisition program, Lassen is very similar to the design of Summit (OLCF). Thus, when encountering new issues it is worth checking also the known Summit issues and work-arounds.