diff options
author | 2022-08-31 11:29:07 -0700 | |
---|---|---|
committer | 2022-08-31 11:29:07 -0700 | |
commit | 19dba606b11391c67e33857926db8b94ee60829c (patch) | |
tree | c8681b8ced83bcbf35e00bb7c68d46c785f5681e /Tools/machines/perlmutter-nersc/perlmutter.sbatch | |
parent | 3e47534613e02fd9bedbdda32892a2e0a7b76817 (diff) | |
download | WarpX-19dba606b11391c67e33857926db8b94ee60829c.tar.gz WarpX-19dba606b11391c67e33857926db8b94ee60829c.tar.zst WarpX-19dba606b11391c67e33857926db8b94ee60829c.zip |
Perlmutter: Work-Around CUDA-Aware MPI & Slurm (#3349)
* Perlmutter: Work-Around CUDA-Aware MPI & Slurm
There are known HPE bugs on Perlmutter that can blow up
simulations (segfault) with CUDA-aware MPI.
We avoid the respective Slurm options now and just manually
control the exposed GPUs per MPI rank.
* Add: `gpus-per-node`
Diffstat (limited to '')
-rw-r--r-- | Tools/machines/perlmutter-nersc/perlmutter.sbatch | 6 |
1 files changed, 4 insertions, 2 deletions
diff --git a/Tools/machines/perlmutter-nersc/perlmutter.sbatch b/Tools/machines/perlmutter-nersc/perlmutter.sbatch index 2c085364d..65777f304 100644 --- a/Tools/machines/perlmutter-nersc/perlmutter.sbatch +++ b/Tools/machines/perlmutter-nersc/perlmutter.sbatch @@ -16,8 +16,7 @@ #SBATCH -C gpu #SBATCH -c 32 #SBATCH --ntasks-per-node=4 -#SBATCH --gpus-per-task=1 -#SBATCH --gpu-bind=single:1 +#SBATCH --gpus-per-node=4 #SBATCH -o WarpX.o%j #SBATCH -e WarpX.e%j @@ -42,6 +41,9 @@ # GPU-aware MPI export MPICH_GPU_SUPPORT_ENABLED=1 +# expose one GPU per MPI rank +export CUDA_VISIBLE_DEVICES=$SLURM_LOCALID + EXE=./warpx #EXE=../WarpX/build/bin/warpx.3d.MPI.CUDA.DP.OPMD.QED #EXE=./main3d.gnu.TPROF.MPI.CUDA.ex |