aboutsummaryrefslogtreecommitdiff
path: root/Tools/BatchScripts (follow)
AgeCommit message (Collapse)AuthorFilesLines
2022-01-19Docs: Reorder HPC Profiles + Batch Scripts (#2757)Gravatar Axel Huebl 11-455/+0
* Docs: Reorder Summit Files * Docs: Reorder Spock Files * Docs: Reorder Cori Files * Docs: Reorder Perlmutter Files * Docs: Reorder Juwels Files * Docs: Reorder Lassen Files * Docs: Reorder Quartz Files * Docs: Reorder Ookami Files * Docs: Also Move Summit Profile Script * Listing Captions: Location in Source
2021-12-15Docs: Perlmutter Early Science (#2674)Gravatar Axel Huebl 1-0/+2
* Docs: Perlmutter Early Science Document how to perform large runs. * Change comment style Co-authored-by: Edoardo Zoni <59625522+EZoni@users.noreply.github.com>
2021-11-15 I/O performance hints for Summit (#2495)Gravatar Jean Luca Bez 2-1/+31
* Fix conflict with upstream * Apply suggestions from code review * Remove space in the end of lines * Include suggestions from PR review * Generalize ROMIO Hints in Batch Scripts * Fix Comment * Fix Comment * Remove duplication * Formatting Co-authored-by: Axel Huebl <axel.huebl@plasma.ninja>
2021-11-01libfabric 1.6+: Document SST Work-Arounds (#2515)Gravatar Axel Huebl 2-0/+18
Document work-arounds for libfabric 1.6+ on Cray systems when using data staging / streaming with ADIOS2 SST.
2021-10-14Docs: Add GPU Memory Sizes (#2414)Gravatar Axel Huebl 1-0/+3
Add the size of GPU HBM on documented HPC machines. This helps users to quickly transition node hours between them for memory-size-limited problems.
2021-09-22Docs: Cori V100 GPU Job Script (#2328)Gravatar Axel Huebl 1-0/+54
* Docs: Cori V100 GPU Job Script Add a job script template for Cori V100 GPU nodes. * Add concrete cgpu allocation * Update Tools/BatchScripts/batch_cori_gpu.sh Co-authored-by: Axel Huebl <axel.huebl@plasma.ninja> * Update: 1 Rank per GPU - 1 rank per GPU - more doc hints * Slurm: CUDA_VISIBLE_DEVICES Set `--gpus-per-task=1` * Visible devices for real * Doc for GPU Binding Co-authored-by: Remi Lehe <remi.lehe@normalesup.org>
2021-09-03Summit: Work-Around IBM MPI Collectives (#2283)Gravatar Axel Huebl 3-0/+12
Fix crashes on Summit at scale (>~224 nodes) due to failing Barriers in IBM's MPI stack. Seen mostly with I/O routines and only since the RHEL8 upgrade. We see no signfiicant performance impact from this work-around until OLCF & IBM fix the problem (OLCFHELP-3545). An alternative work-around via ``` export OMPI_MCA_coll_ibm_collselect_mode_barrier=failsafe ``` was tested and is a tiny bit slower than just falling back to HCOLL or OMPI's barrier implementations via ``` export OMPI_MCA_coll_ibm_skip_barrier=true ``` Thanks to Brian Smith at OLCF for the support!
2021-09-02Docs: Summit w/o Darshan (#2272)Gravatar Axel Huebl 2-5/+3
We see problems again at runtime with missing libs. We don't need Darshan by default, so let's unload it to make our stack more stable. We also avoid sourcing the profile again in the batch script: we inherit the environment and modules loaded at submission.
2021-08-31Docs: Summit umask (Permissions) (#2260)Gravatar Axel Huebl 3-0/+8
After the update to RHEL8, Summit changed the default permissions of newly created files and directories (for the worse). This has been reported as OLCFHELP-3442 but we face some resistance from support to triage this properly at the moment. Since we need to continue to keep working, we change the `umask` (aka defaults for new files & dirs) manually so that group members of the same project can read files and access dirs. For files and dirs created since this update and not yet using this `umask`, please use the following fix. ``` find . -type -d -exec chmod a+rx {} \; find . -type -f -exec chmod a+r {} \; ``` Replace `.` (current directory) with another path if needed.
2021-08-30Docs: ADIOS2 Fixed on Summit (#2239)Gravatar Axel Huebl 1-2/+1
The adios2 system module is now fixed on Summit. We don't need to load the openpmd-api module for CMake, as we build it on-the-fly at the moment against the ADIOS2 and HDF5 modules.
2021-08-30Docs: Perlmutter (#2229)Gravatar Axel Huebl 3-4/+53
* Docs: Perlmutter Start a documentation page for Perlmutter. * Cleaning - better links to docs - clean submission script * Perlmutter: Add I/O
2021-07-26Docs: Update JSC Juwels-Booster (#2133)Gravatar Axel Huebl 1-2/+4
Update CUDA from 11.0 (default) to 11.3.
2021-05-26Docs: Spock (OLCF) (#1988)Gravatar Axel Huebl 1-0/+11
* Docs: Spock (OLCF) Add an initial instruction on how to build on Spock (OLCF) for AMD rocm GPUs (HIP). This works around the missing Cray `PrgEnv-hip` that could be used with the compiler wrappers. * Missing -L: Via $CRAYLIBS_X86_64 Co-authored-by: Weiqun Zhang <WeiqunZhang@lbl.gov>
2021-03-23Docs: HPC Updates (#1830)Gravatar Axel Huebl 1-1/+4
- fix minor typos in example commands - simplify/clarify `git clone` to be uniform - modernize Juwels section to CMake
2020-10-01Doc: LLNL Setups (#1394)Gravatar Axel Huebl 2-0/+62
* [Draft] Doc: Lassen (LLNL) Document installation and usage on Lassen (LLNL). * [Draft] Doc: Quartz (LLNL) Document installation and usage on Quartz (LLNL).
2020-09-25Doc: Cori MPI Thread Multiple (#1376)Gravatar Axel Huebl 2-0/+6
Thread multiple support is not the default on Cori. One needs to set this with an environment variable.
2020-07-28Add job submission script of Cori Haswell in Doc (#1222)Gravatar Yinjian Zhao 1-0/+36
* 1st * modify .sh * Apply suggestions from code review Co-authored-by: Axel Huebl <axel.huebl@plasma.ninja>
2020-07-10Summit Power9 CPUs documentation (#1162)Gravatar Michael E Rowan 1-0/+22
* Power9 CPU docs * Update Tools/BatchScripts/batch_summit_power9.sh Co-authored-by: Axel Huebl <axel.huebl@plasma.ninja> * Subsection convention Co-authored-by: Axel Huebl <axel.huebl@plasma.ninja>
2020-07-02Doc: how to compile and run on Juwels (#1133)Gravatar MaxThevenet 1-0/+20
* Doc how to compile and run on Juwels * add a bunch of AMReX-specific options * Update juwels.rst * Update Docs/source/building/juwels.rst Co-authored-by: MaxThevenet <mthevenet@lbl.gov> * Juwels: Filesystem Note * Jewels Docs: Missing newline Co-authored-by: Axel Huebl <axel.huebl@plasma.ninja>
2020-06-24Summit: no `-n` in jsrun (#1114)Gravatar Axel Huebl 1-6/+8
* Summit: no `-n` in jsrun The calculation of `-n` does not seem to work anymore. Luckily, the parameter seems to be automatically taken by some LFS setting in the job. * Summit Runs: OMP, GPU-Aware MPI, Latency Refs.: - https://jsrunvisualizer.olcf.ornl.gov/?s4f0o11n6c7g1r11d1b1l0= - https://docs.olcf.ornl.gov/systems/summit_user_guide.html#cuda-aware-mpi
2020-05-06Docs: Summit openPMD (#989)Gravatar Axel Huebl 1-1/+1
Document explicit steps to build WarpX with openPMD support on Summit.
2020-04-10Reorganize Tools/ into subfolders, in prevision of LibEnsemble scripts (#908)Gravatar MaxThevenet 3-0/+118
* reorganize Tools/ into subfolders, in prevision of LibEnsemble scripts * Oops, also need to let git know some files have been deleted * caps for consistency * few paths to fix