PushPX: GPU kernel optimization (#3402) - WarpX - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Weiqun Zhang <WeiqunZhang@lbl.gov>	2022-11-18 10:00:12 -0800
committer	GitHub <noreply@github.com>	2022-11-18 18:00:12 +0000
commit	2775ac17fc78b3433d313da46a1c81f932e3912e (patch)
tree	aab7dca2a5cebbfbcf3e8075eb0704d6f6729f8c /Python/pywarpx/picmi.py
parent	2c00044641882f35c70528b913a8d9efbb0a5336 (diff)
download	WarpX-2775ac17fc78b3433d313da46a1c81f932e3912e.tar.gz WarpX-2775ac17fc78b3433d313da46a1c81f932e3912e.tar.zst WarpX-2775ac17fc78b3433d313da46a1c81f932e3912e.zip

PushPX: GPU kernel optimization (#3402)

* PushPX: GPU kernel optimization The GatherAndPush kernel in the PushPX function has a very low occupancy due to register pressure. There are a number of reasons. By default, we compile with QED module on, even if we do not use it at run time. Another culprit is the GetExternalEB functor that contains 7 Parsers. Again, we have to pay a high runtime cost, even if we do not use it. In this PR, we move some runtime logic out of the GPU kernel to eleminate the unnecessary cost if QED and GetExternalEB are not used at run time. Here are some performance results before this PR. | QED | GetExternalEB | Time | |-----+---------------+------| | On | On | 2.17 | | Off | On | 1.79 | | Off | Commented out | 1.34 | Note that in the tests neither QED nor GetExternalEB is actually used at run time. But the extra cost is very high. With this PR, the kernel time is the same as that when both QED and GetExternalEB are disabled at compile time, even though both options are disabled at run time. More information on the kernels compiled for MI250X. The most expensive variant with both QED and GetExternalEB on has NumSgprs: 108 NumVgprs: 256 NumAgprs: 40 TotalNumVgprs: 296 ScratchSize: 264 Occupancy: 1 The cheapest variant with both QED and GetExternalEB disabled has NumSgprs: 104 NumVgprs: 249 NumAgprs: 0 TotalNumVgprs: 249 ScratchSize: 144 Occupancy: 2 * Fix Comments Co-authored-by: Axel Huebl <axel.huebl@plasma.ninja>

Diffstat (limited to 'Python/pywarpx/picmi.py')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: