aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--Docs/source/building/cori.rst3
-rw-r--r--Docs/source/building/summit.rst28
-rw-r--r--Docs/source/running_cpp/parallelization.rst24
-rw-r--r--Docs/source/running_cpp/platforms.rst69
-rw-r--r--Docs/source/running_cpp/running_cpp.rst1
-rwxr-xr-xExamples/Tests/gpu_test/script_profiling.sh (renamed from Examples/Tests/gpu_test/script.sh)0
-rw-r--r--Examples/batchScripts/batch_cori.sh33
-rw-r--r--Examples/batchScripts/batch_summit.sh16
-rw-r--r--Source/Diagnostics/ParticleIO.cpp12
-rw-r--r--Source/Evolve/WarpXEvolveEM.cpp16
-rw-r--r--Source/Particles/MultiParticleContainer.cpp20
-rw-r--r--Source/Particles/PhysicalParticleContainer.cpp110
-rw-r--r--Source/Particles/RigidInjectedParticleContainer.cpp43
-rw-r--r--Source/Particles/WarpXParticleContainer.H18
-rw-r--r--Source/Particles/WarpXParticleContainer.cpp23
-rw-r--r--Tools/compute_domain.py114
16 files changed, 330 insertions, 200 deletions
diff --git a/Docs/source/building/cori.rst b/Docs/source/building/cori.rst
index 2330e2153..d89f3f17b 100644
--- a/Docs/source/building/cori.rst
+++ b/Docs/source/building/cori.rst
@@ -51,6 +51,9 @@ In order to compile for the **Knight's Landing (KNL) architecture**:
module swap PrgEnv-intel PrgEnv-gnu
make -j 16 COMP=gnu
+See :doc:`../running_cpp/platforms` for more information on how to run
+WarpX on Cori.
+
GPU Build
---------
diff --git a/Docs/source/building/summit.rst b/Docs/source/building/summit.rst
index 88588eb72..424cb68f5 100644
--- a/Docs/source/building/summit.rst
+++ b/Docs/source/building/summit.rst
@@ -23,29 +23,5 @@ Then, ``cd`` into the directory ``WarpX`` and use the following set of commands
module load cuda
make -j 4 USE_GPU=TRUE COMP=pgi
-In order to submit a simulation, create a file `submission_script` with
-the following text (replace bracketed variables):
-
-::
-
- #!/bin/bash
- #BSUB -J <jobName>
- #BSUB -W <requestedTime>
- #BSUB -nnodes <numberOfNodes>
- #BSUB -P <accountNumber>
-
- module load pgi
- module load cuda
-
- omp=1
- export OMP_NUM_THREADS=${omp}
- num_nodes=$(( $(printf '%s\n' ${LSB_HOSTS} | sort -u | wc -l) - 1 ))
-
- jsrun -n ${num_nodes} -a 6 -g 6 -c 6 --bind=packed:${omp} --smpiargs="-gpu" <warpxExecutable> <inputScript>
-
-
-Then run
-
-::
-
- bsub submission_script
+See :doc:`../running_cpp/platforms` for more information on how to run
+WarpX on Summit.
diff --git a/Docs/source/running_cpp/parallelization.rst b/Docs/source/running_cpp/parallelization.rst
index 440c17235..a8c89f340 100644
--- a/Docs/source/running_cpp/parallelization.rst
+++ b/Docs/source/running_cpp/parallelization.rst
@@ -61,22 +61,8 @@ and MPI decomposition and computer architecture used for the run:
* Amount of high-bandwidth memory.
-Below is a list of experience-based parameters
-that were observed to give good performance on given supercomputers.
-
-Rule of thumb for 3D runs on NERSC Cori KNL
--------------------------------------------
-
-For a 3D simulation with a few (1-4) particles per cell using FDTD Maxwell
-solver on Cori KNL for a well load-balanced problem (in our case laser
-wakefield acceleration simulation in a boosted frame in the quasi-linear
-regime), the following set of parameters provided good performance:
-
-* ``amr.max_grid_size=64`` and ``amr.blocking_factor=64`` so that the size of
- each grid is fixed to ``64**3`` (we are not using load-balancing here).
-
-* **8 MPI ranks per KNL node**, with ``OMP_NUM_THREADS=8`` (that is 64 threads
- per KNL node, i.e. 1 thread per physical core, and 4 cores left to the
- system).
-
-* **2 grids per MPI**, *i.e.*, 16 grids per KNL node.
+Because these parameters put additional contraints on the domain size for a
+simulation, it can be cumbersome to calculate the number of cells and the
+physical size of the computational domain for a given resolution. This
+:download:`Python script<../../../Tools/compute_domain.py>` does it
+automatically.
diff --git a/Docs/source/running_cpp/platforms.rst b/Docs/source/running_cpp/platforms.rst
new file mode 100644
index 000000000..fc4e2b1fb
--- /dev/null
+++ b/Docs/source/running_cpp/platforms.rst
@@ -0,0 +1,69 @@
+Running on specific platforms
+=============================
+
+Running on Cori KNL at NERSC
+----------------------------
+
+The batch script below can be used to run a WarpX simulation on 2 KNL nodes on
+the supercomputer Cori at NERSC. Replace descriptions between chevrons ``<>``
+by relevant values, for instance ``<job name>`` could be ``laserWakefield``.
+
+.. literalinclude:: ../../../Examples/batchScripts/batch_cori.sh
+ :language: bash
+
+To run a simulation, copy the lines above to a file ``batch_cori.sh`` and
+run
+::
+
+ sbatch batch_cori.sh
+
+to submit the job.
+
+For a 3D simulation with a few (1-4) particles per cell using FDTD Maxwell
+solver on Cori KNL for a well load-balanced problem (in our case laser
+wakefield acceleration simulation in a boosted frame in the quasi-linear
+regime), the following set of parameters provided good performance:
+
+* ``amr.max_grid_size=64`` and ``amr.blocking_factor=64`` so that the size of
+ each grid is fixed to ``64**3`` (we are not using load-balancing here).
+
+* **8 MPI ranks per KNL node**, with ``OMP_NUM_THREADS=8`` (that is 64 threads
+ per KNL node, i.e. 1 thread per physical core, and 4 cores left to the
+ system).
+
+* **2 grids per MPI**, *i.e.*, 16 grids per KNL node.
+
+Running on Summit at OLCF
+-------------------------
+
+The batch script below can be used to run a WarpX simulation on 2 nodes on
+the supercomputer Summit at OLCF. Replace descriptions between chevrons ``<>``
+by relevalt values, for instance ``<input file>`` could be
+``plasma_mirror_inputs``. Note that the only option so far is to run with one
+MPI rank per GPU.
+
+.. literalinclude:: ../../../Examples/batchScripts/batch_summit.sh
+ :language: bash
+
+To run a simulation, copy the lines above to a file ``batch_summit.sh`` and
+run
+::
+
+ bsub batch_summit.sh
+
+to submit the job.
+
+For a 3D simulation with a few (1-4) particles per cell using FDTD Maxwell
+solver on Summit for a well load-balanced problem (in our case laser
+wakefield acceleration simulation in a boosted frame in the quasi-linear
+regime), the following set of parameters provided good performance:
+
+* ``amr.max_grid_size=256`` and ``amr.blocking_factor=128``.
+
+* **One MPI rank per GPU** (e.g., 6 MPI ranks for the 6 GPUs on each Summit
+ node)
+
+* **Two `128x128x128` grids per GPU**, or **one `128x128x256` grid per GPU**.
+
+A batch script with more options regarding profiling on Summit can be found at
+:download:`Summit batch script<../../../Examples/Tests/gpu_test/script_profiling.sh>` \ No newline at end of file
diff --git a/Docs/source/running_cpp/running_cpp.rst b/Docs/source/running_cpp/running_cpp.rst
index 7d82e55f1..31cecb12f 100644
--- a/Docs/source/running_cpp/running_cpp.rst
+++ b/Docs/source/running_cpp/running_cpp.rst
@@ -9,3 +9,4 @@ Running WarpX as an executable
parameters
profiling
parallelization
+ platforms \ No newline at end of file
diff --git a/Examples/Tests/gpu_test/script.sh b/Examples/Tests/gpu_test/script_profiling.sh
index cd6b0eadd..cd6b0eadd 100755
--- a/Examples/Tests/gpu_test/script.sh
+++ b/Examples/Tests/gpu_test/script_profiling.sh
diff --git a/Examples/batchScripts/batch_cori.sh b/Examples/batchScripts/batch_cori.sh
new file mode 100644
index 000000000..e6cd5e1ef
--- /dev/null
+++ b/Examples/batchScripts/batch_cori.sh
@@ -0,0 +1,33 @@
+#!/bin/bash -l
+
+#SBATCH -N 2
+#SBATCH -t 01:00:00
+#SBATCH -q regular
+#SBATCH -C knl
+#SBATCH -S 4
+#SBATCH -J <job name>
+#SBATCH -A <allocation ID>
+#SBATCH -e error.txt
+#SBATCH -o output.txt
+
+export OMP_PLACES=threads
+export OMP_PROC_BIND=spread
+
+# KNLs have 4 hyperthreads max
+export CORI_MAX_HYPETHREAD_LEVEL=4
+# We use 64 cores out of the 68 available on Cori KNL,
+# and leave 4 to the system (see "#SBATCH -S 4" above).
+export CORI_NCORES_PER_NODE=64
+
+# Typically use 8 MPI ranks per node without hyperthreading,
+# i.e., OMP_NUM_THREADS=8
+export WARPX_NMPI_PER_NODE=8
+export WARPX_HYPERTHREAD_LEVEL=1
+
+# Compute OMP_NUM_THREADS and the thread count (-c option)
+export CORI_NHYPERTHREADS_MAX=$(( ${CORI_MAX_HYPETHREAD_LEVEL} * ${CORI_NCORES_PER_NODE} ))
+export WARPX_NTHREADS_PER_NODE=$(( ${WARPX_HYPERTHREAD_LEVEL} * ${CORI_NCORES_PER_NODE} ))
+export OMP_NUM_THREADS=$(( ${WARPX_NTHREADS_PER_NODE} / ${WARPX_NMPI_PER_NODE} ))
+export WARPX_THREAD_COUNT=$(( ${CORI_NHYPERTHREADS_MAX} / ${WARPX_NMPI_PER_NODE} ))
+
+srun --cpu_bind=cores -n $(( ${SLURM_JOB_NUM_NODES} * ${WARPX_NMPI_PER_NODE} )) -c ${WARPX_THREAD_COUNT} <path/to/executable> <input file>
diff --git a/Examples/batchScripts/batch_summit.sh b/Examples/batchScripts/batch_summit.sh
new file mode 100644
index 000000000..002660b91
--- /dev/null
+++ b/Examples/batchScripts/batch_summit.sh
@@ -0,0 +1,16 @@
+#!/bin/bash
+#BSUB -P <allocation ID>
+#BSUB -W 00:10
+#BSUB -nnodes 2
+#BSUB -J WarpX
+#BSUB -o WarpXo.%J
+#BSUB -e WarpXe.%J
+
+module load pgi
+module load cuda
+
+omp=1
+export OMP_NUM_THREADS=${omp}
+
+num_nodes=$(( $(printf '%s\n' ${LSB_HOSTS} | sort -u | wc -l) - 1 ))
+jsrun -n ${num_nodes} -a 6 -g 6 -c 6 --bind=packed:${omp} <path/to/executable> <input file> > output.txt
diff --git a/Source/Diagnostics/ParticleIO.cpp b/Source/Diagnostics/ParticleIO.cpp
index 743df4ce6..8bfa45a59 100644
--- a/Source/Diagnostics/ParticleIO.cpp
+++ b/Source/Diagnostics/ParticleIO.cpp
@@ -103,17 +103,6 @@ MultiParticleContainer::WritePlotFile (const std::string& dir) const
real_names.push_back("theta");
#endif
- if (WarpX::do_boosted_frame_diagnostic && pc->DoBoostedFrameDiags())
- {
- real_names.push_back("xold");
- real_names.push_back("yold");
- real_names.push_back("zold");
-
- real_names.push_back("uxold");
- real_names.push_back("uyold");
- real_names.push_back("uzold");
- }
-
if(pc->do_field_ionization){
int_names.push_back("ionization_level");
// int_flags specifies, for each integer attribs, whether it is
@@ -122,6 +111,7 @@ MultiParticleContainer::WritePlotFile (const std::string& dir) const
// when ionization is on.
int_flags.resize(1, 1);
}
+
// Convert momentum to SI
pc->ConvertUnits(ConvertDirection::WarpX_to_SI);
// real_names contains a list of all particle attributes.
diff --git a/Source/Evolve/WarpXEvolveEM.cpp b/Source/Evolve/WarpXEvolveEM.cpp
index 5313fb418..a831b7dab 100644
--- a/Source/Evolve/WarpXEvolveEM.cpp
+++ b/Source/Evolve/WarpXEvolveEM.cpp
@@ -140,6 +140,14 @@ WarpX::EvolveEM (int numsteps)
bool do_insitu = ((step+1) >= insitu_start) &&
(insitu_int > 0) && ((step+1) % insitu_int == 0);
+ if (do_boosted_frame_diagnostic) {
+ std::unique_ptr<MultiFab> cell_centered_data = nullptr;
+ if (WarpX::do_boosted_frame_fields) {
+ cell_centered_data = GetCellCenteredData();
+ }
+ myBFD->writeLabFrameData(cell_centered_data.get(), *mypc, geom[0], cur_time, dt[0]);
+ }
+
bool move_j = is_synchronized || to_make_plot || do_insitu;
// If is_synchronized we need to shift j too so that next step we can evolve E by dt/2.
// We might need to move j because we are going to make a plotfile.
@@ -173,14 +181,6 @@ WarpX::EvolveEM (int numsteps)
t_new[i] = cur_time;
}
- if (do_boosted_frame_diagnostic) {
- std::unique_ptr<MultiFab> cell_centered_data = nullptr;
- if (WarpX::do_boosted_frame_fields) {
- cell_centered_data = GetCellCenteredData();
- }
- myBFD->writeLabFrameData(cell_centered_data.get(), *mypc, geom[0], cur_time, dt[0]);
- }
-
// slice gen //
if (to_make_plot || do_insitu || to_make_slice_plot)
{
diff --git a/Source/Particles/MultiParticleContainer.cpp b/Source/Particles/MultiParticleContainer.cpp
index 84108b5dc..7803bdae1 100644
--- a/Source/Particles/MultiParticleContainer.cpp
+++ b/Source/Particles/MultiParticleContainer.cpp
@@ -42,26 +42,6 @@ MultiParticleContainer::MultiParticleContainer (AmrCore* amr_core)
nspecies_boosted_frame_diags += 1;
}
}
-
- if (WarpX::do_boosted_frame_diagnostic && do_boosted_frame_diags)
- {
- for (int i = 0; i < nspecies_boosted_frame_diags; ++i)
- {
- int is = map_species_boosted_frame_diags[i];
- allcontainers[is]->AddRealComp("xold");
- allcontainers[is]->AddRealComp("yold");
- allcontainers[is]->AddRealComp("zold");
- allcontainers[is]->AddRealComp("uxold");
- allcontainers[is]->AddRealComp("uyold");
- allcontainers[is]->AddRealComp("uzold");
- }
- pc_tmp->AddRealComp("xold");
- pc_tmp->AddRealComp("yold");
- pc_tmp->AddRealComp("zold");
- pc_tmp->AddRealComp("uxold");
- pc_tmp->AddRealComp("uyold");
- pc_tmp->AddRealComp("uzold");
- }
}
void
diff --git a/Source/Particles/PhysicalParticleContainer.cpp b/Source/Particles/PhysicalParticleContainer.cpp
index 9a3b6a565..1513297a1 100644
--- a/Source/Particles/PhysicalParticleContainer.cpp
+++ b/Source/Particles/PhysicalParticleContainer.cpp
@@ -212,17 +212,8 @@ PhysicalParticleContainer::CheckAndAddParticle(Real x, Real y, Real z,
{
// need to create old values
auto& particle_tile = DefineAndReturnParticleTile(0, 0, 0);
- if (WarpX::do_boosted_frame_diagnostic && do_boosted_frame_diags)
- {
- particle_tile.push_back_real(particle_comps["xold"], x);
- particle_tile.push_back_real(particle_comps["yold"], y);
- particle_tile.push_back_real(particle_comps["zold"], z);
-
- particle_tile.push_back_real(particle_comps["uxold"], u[0]);
- particle_tile.push_back_real(particle_comps["uyold"], u[1]);
- particle_tile.push_back_real(particle_comps["uzold"], u[2]);
- }
}
+
// add particle
AddOneParticle(0, 0, 0, x, y, z, attribs);
}
@@ -301,11 +292,13 @@ PhysicalParticleContainer::AddPlasma (int lev, RealBox part_realbox)
#ifdef _OPENMP
// First touch all tiles in the map in serial
for (MFIter mfi = MakeMFIter(lev); mfi.isValid(); ++mfi) {
- const int grid_id = mfi.index();
- const int tile_id = mfi.LocalTileIndex();
- GetParticles(lev)[std::make_pair(grid_id, tile_id)];
+ auto index = std::make_pair(mfi.index(), mfi.LocalTileIndex());
+ GetParticles(lev)[index];
+ tmp_particle_data.resize(finestLevel()+1);
+ // Create map entry if not there
+ tmp_particle_data[lev][index];
if ( (NumRuntimeRealComps()>0) || (NumRuntimeIntComps()>0) ) {
- DefineAndReturnParticleTile(lev, grid_id, tile_id);
+ DefineAndReturnParticleTile(lev, mfi.index(), mfi.LocalTileIndex());
}
}
#endif
@@ -406,14 +399,16 @@ PhysicalParticleContainer::AddPlasma (int lev, RealBox part_realbox)
// then how many new particles will be injected is not that simple
// We have to shift fine_injection_box because overlap_box has been shifted.
Box fine_overlap_box = overlap_box & amrex::shift(fine_injection_box,shifted);
- max_new_particles += fine_overlap_box.numPts() * num_ppc
- * (AMREX_D_TERM(rrfac,*rrfac,*rrfac)-1);
- for (int icell = 0, ncells = overlap_box.numPts(); icell < ncells; ++icell) {
- IntVect iv = overlap_box.atOffset(icell);
- int r = (fine_overlap_box.contains(iv)) ? AMREX_D_TERM(rrfac,*rrfac,*rrfac) : 1;
- for (int ipart = 0; ipart < r; ++ipart) {
- cellid_v.push_back(icell);
- cellid_v.push_back(ipart);
+ if (fine_overlap_box.ok()) {
+ max_new_particles += fine_overlap_box.numPts() * num_ppc
+ * (AMREX_D_TERM(rrfac,*rrfac,*rrfac)-1);
+ for (int icell = 0, ncells = overlap_box.numPts(); icell < ncells; ++icell) {
+ IntVect iv = overlap_box.atOffset(icell);
+ int r = (fine_overlap_box.contains(iv)) ? AMREX_D_TERM(rrfac,*rrfac,*rrfac) : 1;
+ for (int ipart = 0; ipart < r; ++ipart) {
+ cellid_v.push_back(icell);
+ cellid_v.push_back(ipart);
+ }
}
}
}
@@ -431,12 +426,10 @@ PhysicalParticleContainer::AddPlasma (int lev, RealBox part_realbox)
const int cpuid = ParallelDescriptor::MyProc();
auto& particle_tile = GetParticles(lev)[std::make_pair(grid_id,tile_id)];
- bool do_boosted = false;
+
if ( (NumRuntimeRealComps()>0) || (NumRuntimeIntComps()>0) ) {
DefineAndReturnParticleTile(lev, grid_id, tile_id);
}
- do_boosted = WarpX::do_boosted_frame_diagnostic
- && do_boosted_frame_diags;
auto old_size = particle_tile.GetArrayOfStructs().size();
auto new_size = old_size + max_new_particles;
@@ -448,15 +441,7 @@ PhysicalParticleContainer::AddPlasma (int lev, RealBox part_realbox)
for (int ia = 0; ia < PIdx::nattribs; ++ia) {
pa[ia] = soa.GetRealData(ia).data() + old_size;
}
- GpuArray<Real*,6> pb;
- if (do_boosted) {
- pb[0] = soa.GetRealData(particle_comps[ "xold"]).data() + old_size;
- pb[1] = soa.GetRealData(particle_comps[ "yold"]).data() + old_size;
- pb[2] = soa.GetRealData(particle_comps[ "zold"]).data() + old_size;
- pb[3] = soa.GetRealData(particle_comps["uxold"]).data() + old_size;
- pb[4] = soa.GetRealData(particle_comps["uyold"]).data() + old_size;
- pb[5] = soa.GetRealData(particle_comps["uzold"]).data() + old_size;
- }
+
int* pi;
if (do_field_ionization) {
pi = soa.GetIntData(particle_icomps["ionization_level"]).data() + old_size;
@@ -630,15 +615,6 @@ PhysicalParticleContainer::AddPlasma (int lev, RealBox part_realbox)
pa[PIdx::uy][ip] = u.y;
pa[PIdx::uz][ip] = u.z;
- if (do_boosted) {
- pb[0][ip] = x;
- pb[1][ip] = y;
- pb[2][ip] = z;
- pb[3][ip] = u.x;
- pb[4][ip] = u.y;
- pb[5][ip] = u.z;
- }
-
#if (AMREX_SPACEDIM == 3)
p.pos(0) = x;
p.pos(1) = y;
@@ -990,6 +966,19 @@ PhysicalParticleContainer::Evolve (int lev,
bool has_buffer = cEx || cjx;
+ if (WarpX::do_boosted_frame_diagnostic && do_boosted_frame_diags)
+ {
+ for (WarpXParIter pti(*this, lev); pti.isValid(); ++pti)
+ {
+ const auto np = pti.numParticles();
+ const auto lev = pti.GetLevel();
+ const auto index = pti.GetPairIndex();
+ tmp_particle_data.resize(finestLevel()+1);
+ for (int i = 0; i < TmpIdx::nattribs; ++i)
+ tmp_particle_data[lev][index][i].resize(np);
+ }
+ }
+
#ifdef _OPENMP
#pragma omp parallel
#endif
@@ -1715,22 +1704,21 @@ PhysicalParticleContainer::PushP (int lev, Real dt,
void PhysicalParticleContainer::copy_attribs(WarpXParIter& pti,const Real* xp,
const Real* yp, const Real* zp)
{
-
auto& attribs = pti.GetAttribs();
-
Real* AMREX_RESTRICT uxp = attribs[PIdx::ux].dataPtr();
Real* AMREX_RESTRICT uyp = attribs[PIdx::uy].dataPtr();
Real* AMREX_RESTRICT uzp = attribs[PIdx::uz].dataPtr();
- Real* AMREX_RESTRICT xpold = pti.GetAttribs(particle_comps["xold"]).dataPtr();
- Real* AMREX_RESTRICT ypold = pti.GetAttribs(particle_comps["yold"]).dataPtr();
- Real* AMREX_RESTRICT zpold = pti.GetAttribs(particle_comps["zold"]).dataPtr();
- Real* AMREX_RESTRICT uxpold = pti.GetAttribs(particle_comps["uxold"]).dataPtr();
- Real* AMREX_RESTRICT uypold = pti.GetAttribs(particle_comps["uyold"]).dataPtr();
- Real* AMREX_RESTRICT uzpold = pti.GetAttribs(particle_comps["uzold"]).dataPtr();
-
- const long np = pti.numParticles();
-
+ const auto np = pti.numParticles();
+ const auto lev = pti.GetLevel();
+ const auto index = pti.GetPairIndex();
+ Real* AMREX_RESTRICT xpold = tmp_particle_data[lev][index][TmpIdx::xold ].dataPtr();
+ Real* AMREX_RESTRICT ypold = tmp_particle_data[lev][index][TmpIdx::yold ].dataPtr();
+ Real* AMREX_RESTRICT zpold = tmp_particle_data[lev][index][TmpIdx::zold ].dataPtr();
+ Real* AMREX_RESTRICT uxpold = tmp_particle_data[lev][index][TmpIdx::uxold].dataPtr();
+ Real* AMREX_RESTRICT uypold = tmp_particle_data[lev][index][TmpIdx::uyold].dataPtr();
+ Real* AMREX_RESTRICT uzpold = tmp_particle_data[lev][index][TmpIdx::uzold].dataPtr();
+
ParallelFor( np,
[=] AMREX_GPU_DEVICE (long i) {
xpold[i]=xp[i];
@@ -1815,15 +1803,15 @@ void PhysicalParticleContainer::GetParticleSlice(const int direction, const Real
auto& uyp_new = attribs[PIdx::uy ];
auto& uzp_new = attribs[PIdx::uz ];
- auto& xp_old = pti.GetAttribs(particle_comps["xold"]);
- auto& yp_old = pti.GetAttribs(particle_comps["yold"]);
- auto& zp_old = pti.GetAttribs(particle_comps["zold"]);
- auto& uxp_old = pti.GetAttribs(particle_comps["uxold"]);
- auto& uyp_old = pti.GetAttribs(particle_comps["uyold"]);
- auto& uzp_old = pti.GetAttribs(particle_comps["uzold"]);
+ auto& xp_old = tmp_particle_data[lev][index][TmpIdx::xold];
+ auto& yp_old = tmp_particle_data[lev][index][TmpIdx::yold];
+ auto& zp_old = tmp_particle_data[lev][index][TmpIdx::zold];
+ auto& uxp_old = tmp_particle_data[lev][index][TmpIdx::uxold];
+ auto& uyp_old = tmp_particle_data[lev][index][TmpIdx::uyold];
+ auto& uzp_old = tmp_particle_data[lev][index][TmpIdx::uzold];
const long np = pti.numParticles();
-
+
Real uzfrm = -WarpX::gamma_boost*WarpX::beta_boost*PhysConst::c;
Real inv_c2 = 1.0/PhysConst::c/PhysConst::c;
diff --git a/Source/Particles/RigidInjectedParticleContainer.cpp b/Source/Particles/RigidInjectedParticleContainer.cpp
index a87168a75..4893b3294 100644
--- a/Source/Particles/RigidInjectedParticleContainer.cpp
+++ b/Source/Particles/RigidInjectedParticleContainer.cpp
@@ -239,15 +239,13 @@ RigidInjectedParticleContainer::PushPX(WarpXParIter& pti,
Real* const AMREX_RESTRICT Bzp = attribs[PIdx::Bz].dataPtr();
if (!done_injecting_lev) {
- if (!(WarpX::do_boosted_frame_diagnostic && do_boosted_frame_diags)) {
- // If the old values are not already saved, create copies here.
- xp_save = xp;
- yp_save = yp;
- zp_save = zp;
- uxp_save = uxp;
- uyp_save = uyp;
- uzp_save = uzp;
- }
+ // If the old values are not already saved, create copies here.
+ xp_save = xp;
+ yp_save = yp;
+ zp_save = zp;
+ uxp_save = uxp;
+ uyp_save = uyp;
+ uzp_save = uzp;
// Scale the fields of particles about to cross the injection plane.
// This only approximates what should be happening. The particles
@@ -275,27 +273,12 @@ RigidInjectedParticleContainer::PushPX(WarpXParIter& pti,
if (!done_injecting_lev) {
- Real* AMREX_RESTRICT x_save;
- Real* AMREX_RESTRICT y_save;
- Real* AMREX_RESTRICT z_save;
- Real* AMREX_RESTRICT ux_save;
- Real* AMREX_RESTRICT uy_save;
- Real* AMREX_RESTRICT uz_save;
- if (!(WarpX::do_boosted_frame_diagnostic && do_boosted_frame_diags)) {
- x_save = xp_save.dataPtr();
- y_save = yp_save.dataPtr();
- z_save = zp_save.dataPtr();
- ux_save = uxp_save.dataPtr();
- uy_save = uyp_save.dataPtr();
- uz_save = uzp_save.dataPtr();
- } else {
- x_save = pti.GetAttribs(particle_comps["xold"]).dataPtr();
- y_save = pti.GetAttribs(particle_comps["yold"]).dataPtr();
- z_save = pti.GetAttribs(particle_comps["zold"]).dataPtr();
- ux_save = pti.GetAttribs(particle_comps["uxold"]).dataPtr();
- uy_save = pti.GetAttribs(particle_comps["uyold"]).dataPtr();
- uz_save = pti.GetAttribs(particle_comps["uzold"]).dataPtr();
- }
+ Real* AMREX_RESTRICT x_save = xp_save.dataPtr();
+ Real* AMREX_RESTRICT y_save = yp_save.dataPtr();
+ Real* AMREX_RESTRICT z_save = zp_save.dataPtr();
+ Real* AMREX_RESTRICT ux_save = uxp_save.dataPtr();
+ Real* AMREX_RESTRICT uy_save = uyp_save.dataPtr();
+ Real* AMREX_RESTRICT uz_save = uzp_save.dataPtr();
// Undo the push for particles not injected yet.
// The zp are advanced a fixed amount.
diff --git a/Source/Particles/WarpXParticleContainer.H b/Source/Particles/WarpXParticleContainer.H
index 1f144f628..bc46a116b 100644
--- a/Source/Particles/WarpXParticleContainer.H
+++ b/Source/Particles/WarpXParticleContainer.H
@@ -29,6 +29,15 @@ struct DiagIdx
};
};
+struct TmpIdx
+{
+ enum {
+ xold = 0,
+ yold, zold, uxold, uyold, uzold,
+ nattribs
+ };
+};
+
namespace ParticleStringNames
{
const std::map<std::string, int> to_index = {
@@ -326,7 +335,10 @@ protected:
amrex::Vector<amrex::FArrayBox> local_jy;
amrex::Vector<amrex::FArrayBox> local_jz;
- amrex::Vector<amrex::Cuda::ManagedDeviceVector<amrex::Real> > m_xp, m_yp, m_zp, m_giv;
+ using DataContainer = amrex::Gpu::ManagedDeviceVector<amrex::Real>;
+ using PairIndex = std::pair<int, int>;
+
+ amrex::Vector<DataContainer> m_xp, m_yp, m_zp, m_giv;
// Whether to dump particle quantities.
// If true, particle position is always dumped.
@@ -336,7 +348,9 @@ protected:
amrex::Vector<int> plot_flags;
// list of names of attributes to dump.
amrex::Vector<std::string> plot_vars;
-
+
+ amrex::Vector<std::map<PairIndex, std::array<DataContainer, TmpIdx::nattribs> > > tmp_particle_data;
+
private:
virtual void particlePostLocate(ParticleType& p, const amrex::ParticleLocData& pld,
const int lev) override;
diff --git a/Source/Particles/WarpXParticleContainer.cpp b/Source/Particles/WarpXParticleContainer.cpp
index 2cb18f8b5..8d2499a35 100644
--- a/Source/Particles/WarpXParticleContainer.cpp
+++ b/Source/Particles/WarpXParticleContainer.cpp
@@ -85,17 +85,6 @@ WarpXParticleContainer::WarpXParticleContainer (AmrCore* amr_core, int ispecies)
particle_comps["theta"] = PIdx::theta;
#endif
- if (WarpX::do_boosted_frame_diagnostic && do_boosted_frame_diags)
- {
- particle_comps["xold"] = PIdx::nattribs;
- particle_comps["yold"] = PIdx::nattribs+1;
- particle_comps["zold"] = PIdx::nattribs+2;
- particle_comps["uxold"] = PIdx::nattribs+3;
- particle_comps["uyold"] = PIdx::nattribs+4;
- particle_comps["uzold"] = PIdx::nattribs+5;
-
- }
-
// Initialize temporary local arrays for charge/current deposition
int num_threads = 1;
#ifdef _OPENMP
@@ -240,12 +229,6 @@ WarpXParticleContainer::AddNParticles (int lev,
if ( (NumRuntimeRealComps()>0) || (NumRuntimeIntComps()>0) ){
auto& ptile = DefineAndReturnParticleTile(0, 0, 0);
- if (WarpX::do_boosted_frame_diagnostic && do_boosted_frame_diags)
- {
- ptile.push_back_real(particle_comps["xold"], x[i]);
- ptile.push_back_real(particle_comps["yold"], y[i]);
- ptile.push_back_real(particle_comps["zold"], z[i]);
- }
}
particle_tile.push_back(p);
@@ -260,12 +243,6 @@ WarpXParticleContainer::AddNParticles (int lev,
if ( (NumRuntimeRealComps()>0) || (NumRuntimeIntComps()>0) ){
auto& ptile = DefineAndReturnParticleTile(0, 0, 0);
- if (WarpX::do_boosted_frame_diagnostic && do_boosted_frame_diags)
- {
- ptile.push_back_real(particle_comps["uxold"], vx + ibegin, vx + iend);
- ptile.push_back_real(particle_comps["uyold"], vy + ibegin, vy + iend);
- ptile.push_back_real(particle_comps["uzold"], vz + ibegin, vz + iend);
- }
}
for (int comp = PIdx::uz+1; comp < PIdx::nattribs; ++comp)
diff --git a/Tools/compute_domain.py b/Tools/compute_domain.py
new file mode 100644
index 000000000..822d776e8
--- /dev/null
+++ b/Tools/compute_domain.py
@@ -0,0 +1,114 @@
+import os, shutil, re
+import numpy as np
+import scipy.constants as scc
+import time, copy
+
+'''
+This Python script helps a user to parallelize a WarpX simulation.
+
+The user specifies the minimal size of the physical domain and the resolution
+in each dimension, and the scripts computes:
+- the number of cells and physical domain to satify the user-specified domain
+ size and resolution AND make sure that the number of cells along each
+ direction is a multiple of max_grid_size.
+- a starting point on how to parallelize on Cori KNL (number of nodes, etc.).
+
+When running in a boosted frame, the script also has the option to
+automatically compute the number of cells in z to satisfy dx>dz in the boosted
+frame.
+
+Note that the script has no notion of blocking_factor. It is assumed that
+blocking_factor = max_grid_size, and that all boxes have the same size.
+'''
+
+# Update the lines below for your simulation
+# ------------------------------------------
+# 2 elements for 2D, 3 elements for 3D
+# Lower corner of the box
+box_lo0 = np.array([-25.e-6, -25.e-6, -15.e-6])
+# Upper corner of the box
+box_hi0 = np.array([ 25.e-6, 25.e-6, 60.e-6])
+# Cell size
+dx = 1.e-6
+dz = dx
+cell_size = np.array([dx, dx, dz])
+# Use this for simulations in a boosted frame if you
+# want to enforce dz < dx / dx_over_dz_boosted_frame
+compute_dz_boosted_frame = True
+gamma_boost = 30.
+dx_over_dz_boosted_frame = 1.1 # >1. is usually more stable
+# ------------------------------------------
+
+# similar to numpy.ceil, except the output data type is int
+def intceil(num):
+ return np.ceil(num).astype(int)
+
+# Enlarge simulation boundaries to satisfy three conditions:
+# - The resolution must be exactly the one provided by the user
+# - The physical domain must cover the domain specified by box_lo0, box_hi0
+# - The number of cells must be a multiple of mgs (max_grid_size).
+def adjust_bounds(box_lo0, box_hi0, box_ncell0, mgs):
+ cell_size = (box_hi0-box_lo0) / box_ncell0
+ box_ncell = intceil(box_ncell0/mgs)*mgs
+ box_lo = box_ncell * cell_size * box_lo0 / (box_hi0 - box_lo0)
+ box_hi = box_ncell * cell_size * box_hi0 / (box_hi0 - box_lo0)
+ return box_lo, box_hi, box_ncell
+
+# Calculate parallelization for the simulation, given numerical parameters
+# (number of cells, max_grid_size, number of threads per node etc.)
+def nb_nodes_mpi(box_ncell,mgs,threadspernode,ompnumthreads,ngridpernode, ndim):
+ nmpipernode = threadspernode/ompnumthreads
+ ngridpermpi = ngridpernode/nmpipernode
+ box_ngrids = box_ncell/mgs
+ if ndim == 2:
+ ngrids = box_ngrids[0] * box_ngrids[1]
+ elif ndim == 3:
+ ngrids = np.prod(box_ngrids)
+ n_mpi = intceil( ngrids/ngridpermpi )
+ n_node = intceil( n_mpi/nmpipernode )
+ return n_node, n_mpi
+
+# Get number of dimensions (2 or 3)
+ndim = box_lo0.size
+if compute_dz_boosted_frame:
+ # Adjust dz so that dx/dz = dx_over_dz_boosted_frame in simulation frame
+ cell_size[-1] = cell_size[0] / dx_over_dz_boosted_frame / 2. / gamma_boost
+# Given the resolution, compute number of cells a priori
+box_ncell0 = ( box_hi0 - box_lo0 ) / cell_size
+
+if ndim == 2:
+ # Set of parameters suitable for a 2D simulation on Cori KNL
+ ngridpernode = 16.
+ ompnumthreads = 8.
+ mgs = 1024.
+ threadspernode = 64. # HyperThreading level = 1: no hyperthreading
+ distance_between_threads = int(68*4/threadspernode)
+ c_option = int( ompnumthreads*distance_between_threads )
+elif ndim == 3:
+ # Set of parameters suitable for a 3D simulation on Cori KNL
+ ngridpernode = 8.
+ ompnumthreads = 8.
+ mgs = 64.
+ threadspernode = 64. # HyperThreading level = 1: no hyperthreading
+ distance_between_threads = int(68*4/threadspernode)
+ c_option = int( ompnumthreads*distance_between_threads )
+
+# Adjust simulation bounds
+box_lo, box_hi, box_ncell = adjust_bounds(box_lo0, box_hi0, box_ncell0, mgs)
+
+# Calculate parallelization
+n_node,n_mpi = nb_nodes_mpi(box_ncell, mgs, threadspernode, ompnumthreads, ngridpernode, ndim)
+
+# Print results
+string_output = ' ### Parameters used ### \n'
+string_output += 'ngridpernode = ' + str(ngridpernode) + '\n'
+string_output += 'ompnumthreads = ' + str(ompnumthreads) + '\n'
+string_output += 'mgs (max_grid_size) = ' + str(mgs) + '\n'
+string_output += 'threadspernode ( = # MPI ranks per node * OMP_NUM_THREADS) = ' + str(threadspernode) + '\n'
+string_output += 'ndim = ' + str(ndim) + '\n\n'
+string_output += 'box_lo = ' + str(box_lo) + '\n'
+string_output += 'box_hi = ' + str(box_hi) + '\n'
+string_output += 'box_ncell = ' + str(box_ncell) + '\n'
+string_output += 'n_node = ' + str(n_node) + '\n'
+string_output += 'n_mpi = ' + str(n_mpi) + '\n'
+print(string_output)