5 files changed, 131 insertions, 70 deletions
diff --git a/Docs/source/building/building.rst b/Docs/source/building/building.rst
index a9ea4e755..e0bdabb84 100644
--- a/Docs/source/building/building.rst
+++ b/Docs/source/building/building.rst
@@ -73,8 +73,8 @@ Advanced building instructions
    python
    spack
 
-Building for specific plateforms
---------------------------------
+Building for specific platforms
+-------------------------------
 
 .. toctree::
    :maxdepth: 1
diff --git a/Docs/source/building/cori.rst b/Docs/source/building/cori.rst
index 43c5fc31c..3429ed576 100644
--- a/Docs/source/building/cori.rst
+++ b/Docs/source/building/cori.rst
@@ -51,6 +51,40 @@ In order to compile for the **Knight's Landing (KNL) architecture**:
         module swap PrgEnv-intel PrgEnv-gnu
         make -j 16 COMP=gnu
 
+GPU Build
+---------
+
+To compile on the experimental GPU nodes on Cori, you first need to purge
+your modules, most of which won't work on the GPU nodes.
+
+   ::
+
+	module purge
+
+Then, you need to load the following modules:
+
+    ::
+
+        module load esslurm cuda pgi openmpi/3.1.0-ucx
+
+Currently, you need to use OpenMPI; mvapich2 seems not to work.
+
+Then, you need to use slurm to request access to a GPU node:
+
+    ::
+
+        salloc -C gpu -N 1 -t 30 -c 10 --gres=gpu:1 --mem=30GB -A m1759
+       
+This reserves 10 logical cores (5 physical), 1 GPU, and 30 GB of RAM for your job.
+Note that you can't cross-compile for the GPU nodes - you have to log on to one
+and then build your software.
+
+Finally, navigate to the base of the WarpX repository and compile in GPU mode:
+
+    ::
+
+        make -j 16 COMP=pgi USE_GPU=TRUE
+
 
 Building WarpX with openPMD support
 -----------------------------------
diff --git a/Docs/source/building/summit.rst b/Docs/source/building/summit.rst
index 44cc5c0ee..c1061f29e 100644
--- a/Docs/source/building/summit.rst
+++ b/Docs/source/building/summit.rst
@@ -15,7 +15,7 @@ correct branch:
     git clone --branch master https://bitbucket.org/berkeleylab/picsar.git
     git clone --branch development https://github.com/AMReX-Codes/amrex.git
 
-Then, use the following set of commands to compile:
+Then, ``cd`` into the directory ``WarpX`` and use the following set of commands to compile:
 
 ::
 
diff --git a/Docs/source/running_cpp/parameters.rst b/Docs/source/running_cpp/parameters.rst
index 664c1c2ac..ba64c1fcd 100644
--- a/Docs/source/running_cpp/parameters.rst
+++ b/Docs/source/running_cpp/parameters.rst
@@ -29,12 +29,12 @@ Overall simulation parameters
     (The direction ``y`` cannot be used in 2D simulations.)
 
 * ``warpx.zmax_plasma_to_compute_max_step`` (`float`) optional
-    Can be useful when running in a boosted frame. If specified, automatically 
-    calculates the number of iterations required in the boosted frame for the 
-    lower `z` end of the simulation domain to reach 
-    ``warpx.zmax_plasma_to_compute_max_step`` (typically the plasma end, 
-    given in the lab frame). The value of ``max_step`` is overwritten, and 
-    printed to standard output. Currently only works if the Lorentz boost and 
+    Can be useful when running in a boosted frame. If specified, automatically
+    calculates the number of iterations required in the boosted frame for the
+    lower `z` end of the simulation domain to reach
+    ``warpx.zmax_plasma_to_compute_max_step`` (typically the plasma end,
+    given in the lab frame). The value of ``max_step`` is overwritten, and
+    printed to standard output. Currently only works if the Lorentz boost and
     the moving window are along the z direction.
 
 * ``warpx.verbose`` (`0` or `1`)
@@ -188,8 +188,8 @@ Particle initialization
       This requires the additional parameter ``<species_name>.num_particles_per_cell``.
 
 * ``<species_name>.do_continuous_injection`` (`0` or `1`)
-    Whether to inject particles during the simulation, and not only at 
-    initialization. This can be required whith a moving window and/or when 
+    Whether to inject particles during the simulation, and not only at
+    initialization. This can be required whith a moving window and/or when
     running in a boosted frame.
 
 * ``<species_name>.profile`` (`string`)
@@ -295,20 +295,20 @@ Particle initialization
 * ``<species>.plot_species`` (`0` or `1` optional; default `1`)
     Whether to plot particle quantities for this species.
 
-* ``<species>.plot_vars`` (list of `strings` separated by spaces, optional) 
-    List of particle quantities to write to `plotfiles`. By defaults, all 
-    quantities are written to file. Choices are 
+* ``<species>.plot_vars`` (list of `strings` separated by spaces, optional)
+    List of particle quantities to write to `plotfiles`. By defaults, all
+    quantities are written to file. Choices are
     * ``w`` for the particle weight,
-    * ``ux`` ``uy`` ``uz`` for the particle momentum, 
+    * ``ux`` ``uy`` ``uz`` for the particle momentum,
     * ``Ex`` ``Ey`` ``Ez`` for the electric field on particles,
     * ``Bx`` ``By`` ``Bz`` for the magnetic field on particles.
-    The particle positions are always included. Use 
-    ``<species>.plot_vars = none`` to plot no particle data, except 
+    The particle positions are always included. Use
+    ``<species>.plot_vars = none`` to plot no particle data, except
     particle position.
 
 * ``<species>.do_boosted_frame_diags`` (`0` or `1` optional, default `1`)
-    Only used when ``warpx.do_boosted_frame_diagnostic=1``. When running in a 
-    boosted frame, whether or not to plot back-transformed diagnostics for 
+    Only used when ``warpx.do_boosted_frame_diagnostic=1``. When running in a
+    boosted frame, whether or not to plot back-transformed diagnostics for
     this species.
 
 * ``warpx.serialize_ics`` (`0 or 1`)
@@ -467,13 +467,13 @@ Laser initialization
 
 * ``<laser_name>.do_continuous_injection`` (`0` or `1`) optional (default `0`).
     Whether or not to use continuous injection (`0` or not `0`).
-    If the antenna starts outside of the simulation domain but enters it 
-    at some point (due to moving window or moving antenna in the boosted 
-    frame), use this so that the laser antenna is injected when it reaches 
-    the box boundary. If running in a boosted frame, this requires the 
-    boost direction, moving window direction and laser propagation direction 
-    to be along `z`. If not running in a boosted frame, this requires the 
-    moving window and laser propagation directions to be the same (`x`, `y` 
+    If the antenna starts outside of the simulation domain but enters it
+    at some point (due to moving window or moving antenna in the boosted
+    frame), use this so that the laser antenna is injected when it reaches
+    the box boundary. If running in a boosted frame, this requires the
+    boost direction, moving window direction and laser propagation direction
+    to be along `z`. If not running in a boosted frame, this requires the
+    moving window and laser propagation directions to be the same (`x`, `y`
     or `z`)
 
 * ``warpx.num_mirrors`` (`int`) optional (default `0`)
@@ -512,48 +512,60 @@ Numerics and algorithms
     Number of passes along each direction for the bilinear filter.
     In 2D simulations, only the first two values are read.
 
-* ``algo.current_deposition`` (`integer`)
-    The algorithm for current deposition:
+* ``algo.current_deposition`` (`string`, optional)
+    The algorithm for current deposition. Available options are:
 
-     - ``0``: Esirkepov deposition, vectorized
-     - ``1``: Esirkepov deposition, non-optimized
-     - ``2``: Direct deposition, vectorized
-     - ``3``: Direct deposition, non-optimized
+     - ``esirkepov``: the charge-conserving Esirkepov algorithm
+       (see `Esirkepov, Comp. Phys. Comm. (2001) <https://www.sciencedirect.com/science/article/pii/S0010465500002289>`__)
+     - ``direct``: simpler current deposition algorithm, described in
+       the section :doc:`../theory/picsar_theory`. Note that this algorithm is not strictly charge-conserving.
+     - ``direct-vectorized`` (only available in 3D, and when running on CPU/KNL - as opposed to GPU):
+       mathematically equivalent to ``direct``, but uses an optimized algorithm
+       for vectorization on CPU/KNL (see `Vincenti, Comp. Phys. Comm. (2017)
+       <https://www.sciencedirect.com/science/article/pii/S0010465516302764>`__)
 
-    .. warning::
+    If ``algo.current_deposition`` is not specified, the default is ``esirkepov``.
 
-        On GPU, use ``algo.current_deposition=0`` for Esirkepov
-	or ``3`` for direct deposition.
+* ``algo.charge_deposition`` (`string`, optional)
+    The algorithm for the charge density deposition. Available options are:
 
-* ``algo.charge_deposition`` (`integer`)
-    The algorithm for the charge density deposition:
+     - ``standard``: standard charge deposition algorithm, described in
+       the section :doc:`../theory/picsar_theory`.
+     - ``vectorized`` (only available in 3D, and when running on CPU/KNL - as opposed to GPU):
+       mathematically equivalent to ``standard``, but uses an optimized algorithm
+       for vectorization on CPU/KNL (see `Vincenti, Comp. Phys. Comm. (2017)
+       <https://www.sciencedirect.com/science/article/pii/S0010465516302764>`__)
 
-     - ``0``: Vectorized version
-     - ``1``: Non-optimized version
+    If ``algo.charge_deposition`` is not specified, ``vectorized`` is the default
+    whenever it is available ; ``standard`` is the default otherwise.
 
-* ``algo.field_gathering`` (`integer`)
-    The algorithm for field gathering:
+* ``algo.field_gathering`` (`string`, optional)
+    The algorithm for field gathering. Available options are:
 
-     - ``0``: Vectorized version
-     - ``1``: Non-optimized version
+     - ``standard``: gathers directly from the grid points (either staggered
+       or nodal gridpoints depending on ``warpx.do_nodal``).
+     - ``vectorized`` (not available when running on GPU): mathematically
+       equivalent to ``standard``, but uses optimized vector instructions for CPU/KNL.
 
-    .. warning::
+    If ``algo.field_gathering`` is not specified, ``vectorized`` is the default
+    on CPU/KNL ; ``standard`` is the default on GPU.
+
+* ``algo.particle_pusher`` (`string`, optional)
+    The algorithm for the particle pusher. Available options are:
 
-        The vectorized version does not run on GPU. Use
-		``algo.field_gather=1`` when running on GPU.
+     - ``boris``: Boris pusher.
+     - ``vay``: Vay pusher (see `Vay, Phys. Plasmas (2008) <https://aip.scitation.org/doi/10.1063/1.2837054>`__)
 
-* ``algo.particle_pusher`` (`integer`)
-    The algorithm for the particle pusher:
+     If ``algo.particle_pusher`` is not specified, ``boris`` is the default.
 
-     - ``0``: Boris pusher
-     - ``1``: Vay pusher
+* ``algo.maxwell_fdtd_solver`` (`string`, optional)
+    The algorithm for the FDTD Maxwell field solver. Available options are:
 
-* ``algo.maxwell_fdtd_solver`` (`string`)
-    The algorithm for the FDTD Maxwell field solver:
+     - ``yee``: Yee FDTD solver.
+     - ``ckc``: (not available in ``RZ`` geometry) Cole-Karkkainen solver with Cowan
+       coefficients (see `Cowan, PRSTAB 16 (2013) <https://journals.aps.org/prab/abstract/10.1103/PhysRevSTAB.16.041303>`__)
 
-     - ``yee``: Yee FDTD solver
-     - ``ckc``: Cole-Karkkainen solver with Cowan
-       coefficients (see Cowan - PRST-AB 16, 041303 (2013))
+     If ``algo.maxwell_fdtd_solver`` is not specified, ``yee`` is the default.
 
 * ``interpolation.nox``, ``interpolation.noy``, ``interpolation.noz`` (`integer`)
     The order of the shape factors for the macroparticles, for the 3 dimensions of space.
@@ -576,17 +588,17 @@ Numerics and algorithms
     fields are defined at different points in space)
 
 * ``warpx.do_subcycling`` (`0` or `1`; default: 0)
-    Whether or not to use sub-cycling. Different refinement levels have a 
-    different cell size, which results in different Courant–Friedrichs–Lewy 
-    (CFL) limits for the time step. By default, when using mesh refinement, 
-    the same time step is used for all levels. This time step is 
-    taken as the CFL limit of the finest level. Hence, for coarser 
-    levels, the timestep is only a fraction of the CFL limit for this 
-    level, which may lead to numerical artifacts. With sub-cycling, each level 
-    evolves with its own time step, set to its own CFL limit. In practice, it 
-    means that when level 0 performs one iteration, level 1 performs two 
-    iterations. Currently, this option is only supported when 
-    ``amr.max_level = 1``. More information can be found at 
+    Whether or not to use sub-cycling. Different refinement levels have a
+    different cell size, which results in different Courant–Friedrichs–Lewy
+    (CFL) limits for the time step. By default, when using mesh refinement,
+    the same time step is used for all levels. This time step is
+    taken as the CFL limit of the finest level. Hence, for coarser
+    levels, the timestep is only a fraction of the CFL limit for this
+    level, which may lead to numerical artifacts. With sub-cycling, each level
+    evolves with its own time step, set to its own CFL limit. In practice, it
+    means that when level 0 performs one iteration, level 1 performs two
+    iterations. Currently, this option is only supported when
+    ``amr.max_level = 1``. More information can be found at
     https://ieeexplore.ieee.org/document/8659392.
 
 * ``psatd.nox``, ``psatd.noy``, ``pstad.noz`` (`integer`) optional (default `16` for all)
@@ -653,7 +665,7 @@ Diagnostics and output
     The directory in which to save the lab frame data when using the
     **back-transformed diagnostics**. If not specified, the default is
     is `lab_frame_data`.
-    
+
 * ``warpx.num_snapshots_lab`` (`integer`)
     Only used when ``warpx.do_boosted_frame_diagnostic`` is ``1``.
     The number of lab-frame snapshots that will be written.
@@ -667,9 +679,9 @@ Diagnostics and output
     Whether to use the **back-transformed diagnostics** for the fields.
 
 * ``warpx.boosted_frame_diag_fields`` (space-separated list of `string`)
-    Which fields to dumped in back-transformed diagnostics. Choices are 
-    'Ex', 'Ey', Ez', 'Bx', 'By', Bz', 'jx', 'jy', jz' and 'rho'. Example: 
-    ``warpx.boosted_frame_diag_fields = Ex Ez By``. By default, all fields 
+    Which fields to dumped in back-transformed diagnostics. Choices are
+    'Ex', 'Ey', Ez', 'Bx', 'By', Bz', 'jx', 'jy', jz' and 'rho'. Example:
+    ``warpx.boosted_frame_diag_fields = Ex Ez By``. By default, all fields
     are dumped.
 
 * ``warpx.plot_raw_fields`` (`0` or `1`) optional (default `0`)
@@ -706,6 +718,19 @@ Diagnostics and output
 * ``warpx.plot_B_field`` (`0` or `1` optional; default `1`)
     Whether to plot the magnetic field.
 
+* ``slice.dom_lo`` and ``slice.dom_hi`` (`2 floats in 2D`, `3 floats in 3D`; in meters similar to the units of the simulation box.)
+    The extent of the slice are defined by the co-ordinates of the lower corner (``slice.dom_lo``) and upper corner (``slice.dom_hi``). The slice could be 1D, 2D, or 3D, aligned with the co-ordinate axes and the first axis of the coordinates is x. For example: if for a 3D simulation, an x-z slice is to be extracted at y = 0.0, then the y-value of slice.dom_lo and slice.dom_hi must be equal to 0.0
+
+* ``slice.coarsening_ratio`` (`2 integers in 2D`, `3 integers in 3D`; default `1`)
+    The coarsening ratio input must be greater than 0. Default is 1 in all directions.
+    In the directions that is reduced, i.e., for an x-z slice in 3D, the reduced y-dimension has a default coarsening ratio equal to 1.
+
+* ``slice.plot_int`` (`integer`)
+    The number of PIC cycles inbetween two consecutive data dumps for the slice. Use a
+    negative number to disable slice generation and slice data dumping.
+
+
+
 Checkpoints and restart
 -----------------------
 WarpX supports checkpoints/restart via AMReX.
diff --git a/Docs/source/theory/picsar_theory.rst b/Docs/source/theory/picsar_theory.rst
index 7338d5c36..135d78dea 100644
--- a/Docs/source/theory/picsar_theory.rst
+++ b/Docs/source/theory/picsar_theory.rst
@@ -328,6 +328,8 @@ a collocated and a staggered formulation is application-dependent.
 
 Spectral solvers used to be very popular in the years 1970s to early 1990s, before being replaced by finite-difference methods with the advent of parallel supercomputers that favored local methods. However, it was shown recently that standard domain decomposition with Fast Fourier Transforms that are local to each subdomain could be used effectively with PIC spectral methods (Jean-Luc Vay, Haber, and Godfrey 2013), at the cost of truncation errors in the guard cells that could be neglected. A detailed analysis of the effectiveness of the method with exact evaluation of the magnitude of the effect of the truncation error is given in (Vincenti and Vay 2016) for stencils of arbitrary order (up-to the infinite “spectral” order).
 
+.. _current_deposition:
+
 Current deposition
 ------------------