Running on specific platforms¶
Running on Cori KNL at NERSC¶
The batch script below can be used to run a WarpX simulation on 2 KNL nodes on
the supercomputer Cori at NERSC. Replace descriptions between chevrons <>
by relevant values, for instance <job name>
could be laserWakefield
.
#!/bin/bash -l
#SBATCH -N 2
#SBATCH -t 01:00:00
#SBATCH -q regular
#SBATCH -C knl
#SBATCH -S 4
#SBATCH -J <job name>
#SBATCH -A <allocation ID>
#SBATCH -e error.txt
#SBATCH -o output.txt
export OMP_PLACES=threads
export OMP_PROC_BIND=spread
# KNLs have 4 hyperthreads max
export CORI_MAX_HYPETHREAD_LEVEL=4
# We use 64 cores out of the 68 available on Cori KNL,
# and leave 4 to the system (see "#SBATCH -S 4" above).
export CORI_NCORES_PER_NODE=64
# Typically use 8 MPI ranks per node without hyperthreading,
# i.e., OMP_NUM_THREADS=8
export WARPX_NMPI_PER_NODE=8
export WARPX_HYPERTHREAD_LEVEL=1
# Compute OMP_NUM_THREADS and the thread count (-c option)
export CORI_NHYPERTHREADS_MAX=$(( ${CORI_MAX_HYPETHREAD_LEVEL} * ${CORI_NCORES_PER_NODE} ))
export WARPX_NTHREADS_PER_NODE=$(( ${WARPX_HYPERTHREAD_LEVEL} * ${CORI_NCORES_PER_NODE} ))
export OMP_NUM_THREADS=$(( ${WARPX_NTHREADS_PER_NODE} / ${WARPX_NMPI_PER_NODE} ))
export WARPX_THREAD_COUNT=$(( ${CORI_NHYPERTHREADS_MAX} / ${WARPX_NMPI_PER_NODE} ))
srun --cpu_bind=cores -n $(( ${SLURM_JOB_NUM_NODES} * ${WARPX_NMPI_PER_NODE} )) -c ${WARPX_THREAD_COUNT} <path/to/executable> <input file>
To run a simulation, copy the lines above to a file batch_cori.sh
and
run
sbatch batch_cori.sh
to submit the job.
For a 3D simulation with a few (1-4) particles per cell using FDTD Maxwell solver on Cori KNL for a well load-balanced problem (in our case laser wakefield acceleration simulation in a boosted frame in the quasi-linear regime), the following set of parameters provided good performance:
amr.max_grid_size=64
andamr.blocking_factor=64
so that the size of each grid is fixed to64**3
(we are not using load-balancing here).- 8 MPI ranks per KNL node, with
OMP_NUM_THREADS=8
(that is 64 threads per KNL node, i.e. 1 thread per physical core, and 4 cores left to the system). - 2 grids per MPI, i.e., 16 grids per KNL node.
Running on Summit at OLCF¶
The batch script below can be used to run a WarpX simulation on 2 nodes on
the supercomputer Summit at OLCF. Replace descriptions between chevrons <>
by relevalt values, for instance <input file>
could be
plasma_mirror_inputs
. Note that the only option so far is to run with one
MPI rank per GPU.
#!/bin/bash
#BSUB -P <allocation ID>
#BSUB -W 00:10
#BSUB -nnodes 2
#BSUB -J WarpX
#BSUB -o WarpXo.%J
#BSUB -e WarpXe.%J
module load pgi
module load cuda
omp=1
export OMP_NUM_THREADS=${omp}
num_nodes=$(( $(printf '%s\n' ${LSB_HOSTS} | sort -u | wc -l) - 1 ))
jsrun -n ${num_nodes} -a 6 -g 6 -c 6 --bind=packed:${omp} <path/to/executable> <input file> > output.txt
To run a simulation, copy the lines above to a file batch_summit.sh
and
run
bsub batch_summit.sh
to submit the job.
For a 3D simulation with a few (1-4) particles per cell using FDTD Maxwell solver on Summit for a well load-balanced problem (in our case laser wakefield acceleration simulation in a boosted frame in the quasi-linear regime), the following set of parameters provided good performance:
amr.max_grid_size=256
andamr.blocking_factor=128
.- One MPI rank per GPU (e.g., 6 MPI ranks for the 6 GPUs on each Summit node)
- Two `128x128x128` grids per GPU, or one `128x128x256` grid per GPU.
A batch script with more options regarding profiling on Summit can be found at
Summit batch script