.. _building-ookami:
Ookami (Stony Brook)
====================
The `Ookami cluster `__ is located at Stony Brook University.
If you are new to this system, please see the following resources:
* `Ookami documentation `__
* Batch system: `Slurm `__ (see `available queues `__)
* `Filesystem locations `__:
* ``/lustre/home/`` (30GByte, backuped)
* ``/lustre/scratch/`` (14 day purge)
* ``/lustre/projects/*`` (1TByte default, up to 8TB possible, shared within our group/project, backuped, prefer this location)
We use Ookami as a development cluster for `A64FX `__,
The cluster also provides a few extra nodes, e.g. two ``Thunder X2`` (ARM) nodes.
Installation
------------
Use the following commands to download the WarpX source code and switch to the correct branch:
.. code-block:: bash
git clone https://github.com/ECP-WarpX/WarpX.git $HOME/src/warpx
We use the following modules and environments on the system (``$HOME/warpx_gcc10.profile``).
.. code-block:: bash
# please set your project account (not relevant yet)
#export proj=
# required dependencies
module load cmake/3.19.0
module load gcc/10.3.0
module load openmpi/gcc10/4.1.0
# optional: faster builds (not available yet)
#module load ccache
#module load ninja
# optional: for PSATD support (not available yet)
#module load fftw
# optional: for QED lookup table generation support (not available yet)
#module load boost
# optional: for openPMD support
#module load adios2 # not available yet
#module load hdf5 # only serial
# compiler environment hints
export CC=$(which gcc)
export CXX=$(which g++)
export FC=$(which gfortran)
export CXXFLAGS="-mcpu=a64fx"
We recommend to store the above lines in a file, such as ``$HOME/warpx_gcc10.profile``, and load it into your shell after a login:
.. code-block:: bash
source $HOME/warpx_gcc10.profile
Then, ``cd`` into the directory ``$HOME/src/warpx`` and use the following commands to compile:
.. code-block:: bash
cd $HOME/src/warpx
rm -rf build
cmake -S . -B build -DWarpX_COMPUTE=OMP -DWarpX_OPENPMD=ON
cmake --build build -j 10
# or (currently better performance)
cmake -S . -B build -DWarpX_COMPUTE=NOACC -DWarpX_OPENPMD=ON
cmake --build build -j 10
The general :ref:`cmake compile-time options ` apply as usual.
.. _running-cpp-ookami:
Running
-------
For running on 48 cores of a single node:
.. code-block:: bash
srun -p short -N 1 -n 48 --pty bash
OMP_NUM_THREADS=1 mpiexec -n 48 --map-by ppr:12:numa:pe=1 --report-bindings ./warpx inputs
# alternatively, using 4 MPI ranks with each 12 threads on a single node:
OMP_NUM_THREADS=12 mpiexec -n 4 --map-by ppr:4:numa:pe=12 --report-bindings ./warpx inputs
The Ookami HPE Apollo 80 system has 174 A64FX compute nodes each with 32GB of high-bandwidth memory.
Additional Compilers
--------------------
This section is just a note for developers.
We compiled with the Fujitsu Compiler (Clang) with the following build string:
.. code-block:: bash
cmake -S . -B build \
-DCMAKE_C_COMPILER=$(which mpifcc) \
-DCMAKE_C_COMPILER_ID="Clang" \
-DCMAKE_C_COMPILER_VERSION=12.0 \
-DCMAKE_C_STANDARD_COMPUTED_DEFAULT="11" \
-DCMAKE_CXX_COMPILER=$(which mpiFCC) \
-DCMAKE_CXX_COMPILER_ID="Clang" \
-DCMAKE_CXX_COMPILER_VERSION=12.0 \
-DCMAKE_CXX_STANDARD_COMPUTED_DEFAULT="14" \
-DCMAKE_CXX_FLAGS="-Nclang" \
-DAMReX_DIFFERENT_COMPILER=ON \
-DAMReX_MPI_THREAD_MULTIPLE=FALSE \
-DWarpX_COMPUTE=OMP
cmake --build build -j 10
An internal compiler error requires us to modify a range-based for loop to a conventional for loop for ``WarpX::setLoadBalanceEfficiency``.
We need to rewrite (at the moment three) loops that look roughly like this:
.. code-block:: cpp
for (int i : costs[lev]->IndexArray()) {
(*costs[lev])[i] = 0.0;
WarpX::setLoadBalanceEfficiency(lev, -1);
}
into
.. code-block:: cpp
const auto idx_arr = costs[lev]->IndexArray();
for (auto it = idx_arr.begin(); it < idx_arr.end(); ++it ) {
(*costs[lev])[*it] = 0.0;
WarpX::setLoadBalanceEfficiency(lev, -1);
}