.. _building-ookami: Ookami (Stony Brook) ==================== The `Ookami cluster `__ is located at Stony Brook University. If you are new to this system, please see the following resources: * `Ookami documentation `__ * Batch system: `Slurm `__ (see `available queues `__) * `Filesystem locations `__: * ``/lustre/home/`` (30GByte, backuped) * ``/lustre/scratch/`` (14 day purge) * ``/lustre/projects/*`` (1TByte default, up to 8TB possible, shared within our group/project, backuped, prefer this location) We use Ookami as a development cluster for `A64FX `__, The cluster also provides a few extra nodes, e.g. two ``Thunder X2`` (ARM) nodes. Installation ------------ Use the following commands to download the WarpX source code and switch to the correct branch: .. code-block:: bash git clone https://github.com/ECP-WarpX/WarpX.git $HOME/src/warpx We use the following modules and environments on the system (``$HOME/warpx_gcc10.profile``). .. code-block:: bash # please set your project account (not relevant yet) #export proj= # required dependencies module load cmake/3.19.0 module load gcc/10.3.0 module load openmpi/gcc10/4.1.0 # optional: faster builds (not available yet) #module load ccache #module load ninja # optional: for PSATD support (not available yet) #module load fftw # optional: for QED lookup table generation support (not available yet) #module load boost # optional: for openPMD support #module load adios2 # not available yet #module load hdf5 # only serial # compiler environment hints export CC=$(which gcc) export CXX=$(which g++) export FC=$(which gfortran) export CXXFLAGS="-mcpu=a64fx" We recommend to store the above lines in a file, such as ``$HOME/warpx_gcc10.profile``, and load it into your shell after a login: .. code-block:: bash source $HOME/warpx_gcc10.profile Then, ``cd`` into the directory ``$HOME/src/warpx`` and use the following commands to compile: .. code-block:: bash cd $HOME/src/warpx rm -rf build cmake -S . -B build -DWarpX_COMPUTE=OMP -DWarpX_OPENPMD=ON cmake --build build -j 10 # or (currently better performance) cmake -S . -B build -DWarpX_COMPUTE=NOACC -DWarpX_OPENPMD=ON cmake --build build -j 10 The general :ref:`cmake compile-time options ` apply as usual. .. _running-cpp-ookami: Running ------- For running on 48 cores of a single node: .. code-block:: bash srun -p short -N 1 -n 48 --pty bash OMP_NUM_THREADS=1 mpiexec -n 48 --map-by ppr:12:numa:pe=1 --report-bindings ./warpx inputs # alternatively, using 4 MPI ranks with each 12 threads on a single node: OMP_NUM_THREADS=12 mpiexec -n 4 --map-by ppr:4:numa:pe=12 --report-bindings ./warpx inputs The Ookami HPE Apollo 80 system has 174 A64FX compute nodes each with 32GB of high-bandwidth memory. Additional Compilers -------------------- This section is just a note for developers. We compiled with the Fujitsu Compiler (Clang) with the following build string: .. code-block:: bash cmake -S . -B build \ -DCMAKE_C_COMPILER=$(which mpifcc) \ -DCMAKE_C_COMPILER_ID="Clang" \ -DCMAKE_C_COMPILER_VERSION=12.0 \ -DCMAKE_C_STANDARD_COMPUTED_DEFAULT="11" \ -DCMAKE_CXX_COMPILER=$(which mpiFCC) \ -DCMAKE_CXX_COMPILER_ID="Clang" \ -DCMAKE_CXX_COMPILER_VERSION=12.0 \ -DCMAKE_CXX_STANDARD_COMPUTED_DEFAULT="14" \ -DCMAKE_CXX_FLAGS="-Nclang" \ -DAMReX_DIFFERENT_COMPILER=ON \ -DAMReX_MPI_THREAD_MULTIPLE=FALSE \ -DWarpX_COMPUTE=OMP cmake --build build -j 10 An internal compiler error requires us to modify a range-based for loop to a conventional for loop for ``WarpX::setLoadBalanceEfficiency``. We need to rewrite (at the moment three) loops that look roughly like this: .. code-block:: cpp for (int i : costs[lev]->IndexArray()) { (*costs[lev])[i] = 0.0; WarpX::setLoadBalanceEfficiency(lev, -1); } into .. code-block:: cpp const auto idx_arr = costs[lev]->IndexArray(); for (auto it = idx_arr.begin(); it < idx_arr.end(); ++it ) { (*costs[lev])[*it] = 0.0; WarpX::setLoadBalanceEfficiency(lev, -1); }