Karolina (IT4I)

The Karolina cluster is located at IT4I, Technical University of Ostrava.

Introduction

If you are new to this system, please see the following resources:

IT4I user guide
Batch system: PBS
Jupyter service: not provided/documented (yet)
Filesystems:
- $HOME: per-user directory, use only for inputs, source and scripts; backed up (25GB default quota)
- /scatch/: production directory; very fast for parallel jobs (20TB default)

Preparation

Use the following commands to download the WarpX source code:

git clone https://github.com/ECP-WarpX/WarpX.git $HOME/src/warpx

On Karolina, you can run either on GPU nodes with fast A100 GPUs (recommended) or CPU nodes.

A100 GPUs

We use system software modules, add environment hints and further dependencies via the file $HOME/karolina_gpu_warpx.profile. Create it now:

cp $HOME/src/warpx/Tools/machines/karolina-it4i/karolina_gpu_warpx.profile.example $HOME/karolina_gpu_warpx.profile

Script Details

# please set your project account
export proj=""  # change me!

# remembers the location of this script
export MY_PROFILE=$(cd $(dirname $BASH_SOURCE) && pwd)"/"$(basename $BASH_SOURCE)
if [ -z ${proj-} ]; then echo "WARNING: The 'proj' variable is not yet set in your $MY_PROFILE file! Please edit its line 2 to continue!"; return; fi

# required dependencies
module purge
ml GCCcore/11.3.0
ml CUDA/11.7.0
ml OpenMPI/4.1.4-GCC-11.3.0-CUDA-11.7.0
ml CMake/3.23.1-GCCcore-11.3.0

# optional: for QED support with detailed tables
ml Boost/1.79.0-GCC-11.3.0

# optional: for openPMD and PSATD+RZ support
ml OpenBLAS/0.3.20-GCC-11.3.0
export CMAKE_PREFIX_PATH=${HOME}/sw/karolina/gpu/hdf5-1.14.1.2:$CMAKE_PREFIX_PATH
export CMAKE_PREFIX_PATH=${HOME}/sw/karolina/gpu/c-blosc-1.21.1:$CMAKE_PREFIX_PATH
export CMAKE_PREFIX_PATH=${HOME}/sw/karolina/gpu/adios2-2.8.3:$CMAKE_PREFIX_PATH
export CMAKE_PREFIX_PATH=${HOME}/sw/karolina/gpu/blaspp-master:$CMAKE_PREFIX_PATH
export CMAKE_PREFIX_PATH=${HOME}/sw/karolina/gpu/lapackpp-master:$CMAKE_PREFIX_PATH

export LD_LIBRARY_PATH=${HOME}/sw/karolina/gpu/hdf5-1.14.1.2/lib64:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=${HOME}/sw/karolina/gpu/c-blosc-1.21.1/lib64:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=${HOME}/sw/karolina/gpu/adios2-2.8.3/lib64:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=${HOME}/sw/karolina/gpu/blaspp-master/lib64:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=${HOME}/sw/karolina/gpu/lapackpp-master/lib64:$LD_LIBRARY_PATH

export PATH=${HOME}/sw/karolina/gpu/hdf5-1.14.1.2/bin:${PATH}
export PATH=${HOME}/sw/karolina/gpu/adios2-2.8.3/bin:${PATH}

# optional: CCache (not found)
#ml ccache

# optional: for Python bindings or libEnsemble
ml Python/3.10.4-GCCcore-11.3.0-bare

if [ -d "${HOME}/sw/karolina/gpu/venvs/warpx-gpu" ]
then
  source ${HOME}/sw/karolina/gpu/venvs/warpx-gpu/bin/activate
fi

# an alias to request an interactive batch node for one hour (TODO)
#   for parallel execution, start on the batch node: srun <command>
alias getNode="qsub -q qgpu -A $proj -l select=1:ncpus=32:ngpus=4 -l walltime=1:00:00 -I"
# an alias to run a command on a batch node for up to 1hr
#   usage: runNode <command>
alias runNode='echo -e "#!/bin/bash\nmpirun -n 4 $1" | qsub -q qgpu -A $proj -l select=1:ncpus=32:ngpus=4 -l walltime=1:00:00'

# optimize CUDA compilation for A100
export AMREX_CUDA_ARCH=8.0

# optimize CPU microarchitecture for ... (TODO)
#export CXXFLAGS="-march=abc"
#export CFLAGS="-march=def"

# compiler environment hints
export CC=$(which gcc)
export CXX=$(which g++)
export FC=$(which gfortran)
export CUDACXX=$(which nvcc)
export CUDAHOSTCXX=${CXX}

Edit the 2nd line of this script, which sets the export proj="" variable. For example, if you are member of the project DD-23-83, then run vi $HOME/karolina_gpu_warpx.profile. Enter the edit mode by typing i and edit line 2 to read:

export proj="DD-23-83"

Exit the vi editor with Esc and then type :wq (write & quit).

Important

Now, and as the first step on future logins to Karolina, activate these environment settings:

source $HOME/karolina_gpu_warpx.profile

Finally, since Karolina does not yet provide software modules for some of our dependencies, install them once:

bash $HOME/src/warpx/Tools/machines/karolina-it4i/install_gpu_dependencies.sh
source $HOME/sw/karolina/gpu/venvs/warpx-gpu/bin/activate

Script Details

#!/bin/bash
#
# Copyright 2023 The WarpX Community
#
# This file is part of WarpX.
#
# Author: Axel Huebl
# License: BSD-3-Clause-LBNL

# Exit on first error encountered #############################################
#
set -eu -o pipefail


# Check: ######################################################################
#
#   Was karolina_gpu_warpx.profile sourced and configured correctly?
if [ -z ${proj-} ]; then echo "WARNING: The 'proj' variable is not yet set in your karolina_gpu_warpx.profile file! Please edit its line 2 to continue!"; exit 1; fi


# Remove old dependencies #####################################################
#
SW_DIR="${HOME}/sw/karolina/gpu"
rm -rf ${SW_DIR}
mkdir -p ${SW_DIR}

# remove common user mistakes in python, located in .local instead of a venv
python3 -m pip uninstall -qq -y pywarpx
python3 -m pip uninstall -qq -y warpx
python3 -m pip uninstall -qqq -y mpi4py 2>/dev/null || true


# General extra dependencies ##################################################
#

# c-blosc (I/O compression)
if [ -d $HOME/src/c-blosc ]
then
  cd $HOME/src/c-blosc
  git fetch --prune
  git checkout v1.21.1
  cd -
else
  git clone -b v1.21.1 https://github.com/Blosc/c-blosc.git $HOME/src/c-blosc
fi
rm -rf $HOME/src/c-blosc-gpu-build
cmake -S $HOME/src/c-blosc -B $HOME/src/c-blosc-gpu-build -DBUILD_TESTS=OFF -DBUILD_BENCHMARKS=OFF -DDEACTIVATE_AVX2=OFF -DCMAKE_INSTALL_PREFIX=${SW_DIR}/c-blosc-1.21.1
cmake --build $HOME/src/c-blosc-gpu-build --target install --parallel 16
rm -rf $HOME/src/c-blosc-gpu-build

# HDF5
if [ -d $HOME/src/hdf5 ]
then
  cd $HOME/src/hdf5
  git fetch --prune
  git checkout hdf5-1_14_1-2
  cd -
else
  git clone -b hdf5-1_14_1-2 https://github.com/HDFGroup/hdf5.git src/hdf5
fi
rm -rf $HOME/src/hdf5-build
cmake -S $HOME/src/hdf5 -B $HOME/src/hdf5-build -DBUILD_TESTING=OFF -DHDF5_ENABLE_PARALLEL=ON -DCMAKE_INSTALL_PREFIX=${SW_DIR}/hdf5-1.14.1.2
cmake --build $HOME/src/hdf5-build --target install --parallel 16
rm -rf $HOME/src/hdf5-build

# ADIOS2
if [ -d $HOME/src/adios2 ]
then
  cd $HOME/src/adios2
  git fetch --prune
  git checkout v2.8.3
  cd -
else
  git clone -b v2.8.3 https://github.com/ornladios/ADIOS2.git $HOME/src/adios2
fi
rm -rf $HOME/src/adios2-gpu-build
cmake -S $HOME/src/adios2 -B $HOME/src/adios2-gpu-build -DADIOS2_USE_Blosc=ON -DADIOS2_USE_HDF5=OFF -DADIOS2_USE_Fortran=OFF -DADIOS2_USE_Python=OFF -DADIOS2_USE_ZeroMQ=OFF -DCMAKE_INSTALL_PREFIX=${SW_DIR}/adios2-2.8.3
cmake --build $HOME/src/adios2-gpu-build --target install --parallel 12
rm -rf $HOME/src/adios2-gpu-build

# BLAS++ (for PSATD+RZ)
if [ -d $HOME/src/blaspp ]
then
  cd $HOME/src/blaspp
  git fetch --prune
  git checkout master
  git pull
  cd -
else
  git clone https://github.com/icl-utk-edu/blaspp.git $HOME/src/blaspp
fi
rm -rf $HOME/src/blaspp-gpu-build
cmake -S $HOME/src/blaspp -B $HOME/src/blaspp-gpu-build -Duse_openmp=OFF -Dgpu_backend=cuda -DCMAKE_CXX_STANDARD=17 -DCMAKE_INSTALL_PREFIX=${SW_DIR}/blaspp-master
cmake --build $HOME/src/blaspp-gpu-build --target install --parallel 12
rm -rf $HOME/src/blaspp-gpu-build

# LAPACK++ (for PSATD+RZ)
if [ -d $HOME/src/lapackpp ]
then
  cd $HOME/src/lapackpp
  git fetch --prune
  git checkout master
  git pull
  cd -
else
  git clone https://github.com/icl-utk-edu/lapackpp.git $HOME/src/lapackpp
fi
rm -rf $HOME/src/lapackpp-gpu-build
CXXFLAGS="-DLAPACK_FORTRAN_ADD_" cmake -S $HOME/src/lapackpp -B $HOME/src/lapackpp-gpu-build -DCMAKE_CXX_STANDARD=17 -Dbuild_tests=OFF -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=ON -DCMAKE_INSTALL_PREFIX=${SW_DIR}/lapackpp-master
cmake --build $HOME/src/lapackpp-gpu-build --target install --parallel 12
rm -rf $HOME/src/lapackpp-gpu-build


# Python ######################################################################
#
python3 -m pip install --upgrade pip
python3 -m pip install --upgrade virtualenv
python3 -m pip cache purge
rm -rf ${SW_DIR}/venvs/warpx-gpu
python3 -m venv ${SW_DIR}/venvs/warpx-gpu
source ${SW_DIR}/venvs/warpx-gpu/bin/activate
python3 -m pip install --upgrade pip
python3 -m pip install --upgrade build
python3 -m pip install --upgrade packaging
python3 -m pip install --upgrade wheel
python3 -m pip install --upgrade setuptools
python3 -m pip install --upgrade cython
python3 -m pip install --upgrade numpy
python3 -m pip install --upgrade pandas
python3 -m pip install --upgrade scipy
python3 -m pip install --upgrade mpi4py --no-cache-dir --no-build-isolation --no-binary mpi4py
python3 -m pip install --upgrade openpmd-api
python3 -m pip install --upgrade matplotlib
python3 -m pip install --upgrade yt
# install or update WarpX dependencies such as picmistandard
python3 -m pip install --upgrade -r $HOME/src/warpx/requirements.txt
# optional: for libEnsemble
python3 -m pip install -r $HOME/src/warpx/Tools/LibEnsemble/requirements.txt
# optional: for optimas (based on libEnsemble & ax->botorch->gpytorch->pytorch)
python3 -m pip install --upgrade torch  # CUDA 11.7 compatible wheel
python3 -m pip install -r $HOME/src/warpx/Tools/optimas/requirements.txt

CPU Nodes

CPU usage is documentation is TODO.

Compilation

Use the following cmake commands to compile the application executable:

A100 GPUs

cd $HOME/src/warpx
rm -rf build_gpu

cmake -S . -B build_gpu -DWarpX_COMPUTE=CUDA -DWarpX_PSATD=ON -DWarpX_QED_TABLE_GEN=ON -DWarpX_DIMS="1;2;RZ;3"
cmake --build build_gpu -j 12

The WarpX application executables are now in $HOME/src/warpx/build_gpu/bin/. Additionally, the following commands will install WarpX as a Python module:

rm -rf build_gpu_py

cmake -S . -B build_gpu_py -DWarpX_COMPUTE=CUDA -DWarpX_PSATD=ON -DWarpX_QED_TABLE_GEN=ON -DWarpX_APP=OFF -DWarpX_PYTHON=ON -DWarpX_DIMS="1;2;RZ;3"
cmake --build build_gpu_py -j 12 --target pip_install

CPU Nodes

cd $HOME/src/warpx
rm -rf build_cpu

cmake -S . -B build_cpu -DWarpX_COMPUTE=OMP -DWarpX_PSATD=ON -DWarpX_QED_TABLE_GEN=ON -DWarpX_DIMS="1;2;RZ;3"
cmake --build build_cpu -j 12

The WarpX application executables are now in $HOME/src/warpx/build_cpu/bin/. Additionally, the following commands will install WarpX as a Python module:

cd $HOME/src/warpx
rm -rf build_cpu_py

cmake -S . -B build_cpu_py -DWarpX_COMPUTE=OMP -DWarpX_PSATD=ON -DWarpX_QED_TABLE_GEN=ON -DWarpX_APP=OFF -DWarpX_PYTHON=ON -DWarpX_DIMS="1;2;RZ;3"
cmake --build build_cpu_py -j 12 --target pip_install

Now, you can submit Karolina compute jobs for WarpX Python (PICMI) scripts (example scripts). Or, you can use the WarpX executables to submit Karolina jobs (example inputs). For executables, you can reference their location in your job script or copy them to a location in /scatch/.

Update WarpX & Dependencies

If you already installed WarpX in the past and want to update it, start by getting the latest source code:

cd $HOME/src/warpx

# read the output of this command - does it look ok?
git status

# get the latest WarpX source code
git fetch
git pull

# read the output of these commands - do they look ok?
git status
git log # press q to exit

And, if needed,

update the karolina_gpu_warpx.profile or karolina_cpu_warpx.profile files,
log out and into the system, activate the now updated environment profile as usual,
execute the dependency install scripts.

As a last step, clean the build directory rm -rf $HOME/src/warpx/build_* and rebuild WarpX.

Running

A100 (40GB) GPUs

The batch script below can be used to run a WarpX simulation on multiple GPU nodes (change #PBS -l select= accordingly) on the supercomputer Karolina at IT4I. This partition as up to 72 nodes. Every node has 8x A100 (40GB) GPUs and 2x AMD EPYC 7763, 64-core, 2.45 GHz processors.

Replace descriptions between chevrons <> by relevant values, for instance <proj> could be DD-23-83. Note that we run one MPI rank per GPU.

Listing 8 You can copy this file from $HOME/src/warpx/Tools/machines/karolina-it4i/karolina_gpu.qsub.

#!/bin/bash -l

# Copyright 2023 The WarpX Community
#
# This file is part of WarpX.
#
# Authors: Axel Huebl, Andrei Berceanu
# License: BSD-3-Clause-LBNL

#PBS -q qgpu
#PBS -N WarpX
# Use two full nodes, 8 GPUs per node, 16 GPUs total
#PBS -l select=2:ncpus=128:ngpus=8:mpiprocs=8:ompthreads=16,walltime=00:10:00
#PBS -A <proj>

cd ${PBS_O_WORKDIR}

# executable & inputs file or python interpreter & PICMI script here
EXE=./warpx.rz
INPUTS=inputs_rz

# OpenMP threads per MPI rank
export OMP_NUM_THREADS=16

# run
mpirun -np ${PBS_NP} bash -c "
    export CUDA_VISIBLE_DEVICES=\${OMPI_COMM_WORLD_LOCAL_RANK};
    ${EXE} ${INPUTS}" \
  > output.txt

To run a simulation, copy the lines above to a file karolina_gpu.qsub and run

qsub karolina_gpu.qsub

to submit the job.

CPU Nodes

CPU usage is documentation is TODO.

Post-Processing

Note

This section was not yet written. Usually, we document here how to use a Jupyter service.