Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Soft & hard source #66

Open
jackieyao0114 opened this issue Feb 23, 2021 · 1 comment
Open

Soft & hard source #66

jackieyao0114 opened this issue Feb 23, 2021 · 1 comment

Comments

@jackieyao0114
Copy link

  1. Hard source needs to be called immediately after each of the evolve functions. The right sequence should be: evolveHM - hard H source - evolveE - hard E source - evolveHM - hard H source
  2. Soft source needs an extra factor with Delta_t included in it. Derivation needed to figure out the exact formula of this factor.
  3. Soft source calling strategy needs to be updated in one of the following three ways:
    • split soft source calls in every time step, apply soft source twice just as the way we apply hard source. Use Delta_t/2 instead of Delta_t in each call.
    • apply soft source once in the beginning of each time step.
    • ?
@ajnonaka
Copy link

The 3rd soft source option is to include it with the current source multifab.

RevathiJambunathan pushed a commit that referenced this issue Apr 29, 2023
* Use GPU shared memory to accelerate charge deposition (#66)

* WIP Apply charge deposition unconditionally in scratch memory

* Ensure enough threads to touch every value in the array, even if there are no particles

* Zero out the shared memory before accumulating into it

* Replace box-aware accumulation of final results with simple pointers

* Remove unused code

* WIP

* Account for shared memory being allocated per-block, not per grid/kernel

* Wording

* Fall back to non-shared memory for cases where the grid size is too big to fit, for now

* Filter out additions of 0.0 from atomic accumulation

* Restore non-GPU code path

* Pick apart #if stuff to allow better formatting and comprehension

* Fix egregious whitespace failure

* Abort on insufficient shared memory, rather than falling back to global memory

* Fix silly whitespace

* Fix stray tab character

* Sort and bin particles, pass bins to charge deposition

* Contribute on a binned tile basis; memory errors now

* Initialize array to the extent we actually allocated

* Make sure we initialize the vector the tboxes with invalid Box objects

* in 2D, we make sure we use the same particle position to tile box mapping as amrex

* go ahead and skip empty bins in deposit charge

* Quiet warning from HIP

* Avoid signed/unsigned comparison

* Code compiles for CPU ...

* Rename intermediate buffer back to reduce extraneous diff bits

* Remove another extraneous diff bit

* Leave DPC++ out in the cold, since it expects different syntax for GPU-specific code

* Reset failing value of rho that only slightly changes

* Try tiling over ng_rho to capture particles moved into guard cells

* Match box expansion in both call sites - maybe the third is also necessary

* Grow last box by ng_rho as well.

* Match macro syntax

* Use WarpX dimensionality macros instead of AMREX_SPACEDIM

* Add support for 1D case

* Update benchmark checksums for ME tests

* Fix macro used for 1D

* Fix CUDA compilation

* Rename variable to simplify diff

* Clear up assertions now that stuff is working

* Fix comment referring to current in ChargeDeposition.H

* Fix warning about unused variable after assertion change

* add runtime option

* Convert flag variable from int to bool

* Switch AMREX_SPACEDIM conditions to use WARPX_DIM_* macros

* Once again, leave DPC++ out in the cold, as it doesn't support the same syntax as CUDA and HIP

* Grow charge deposition boxes by the necessary amount

* Mark a variable only used for assertions to suppress warnings

* Fix compilation error for 1D

* Re-add missing 1D support

* Fix other bits of codfe specific to CUDA and HIP, and not DPC++

* restore missing accumulation of thread local charge into main fab.

* reset benchmark for background_mcc because randomization makes it very sensitive

* reset benchmark for Langmuir_multi_psatd_div_cleaning because diffing field is a numerical artifact

* Calm nvcc about function missing a return

* reset benchmark for background_mcc because it's randomized and numerically chaotic

* reset benchmark for LaserAccelerationBoost because of numerical shift in momentum from charge deposition order

* Remove extra nesting level

* Skip sorting the particles and just access them according to the binned permutation

* Load permutation pointer outside GPU kernel

* Revert background_mcc benchmark values

* Loosen overly-strict checksum tolerances in single-precision tests, rather than changing target values

* Revert embedded_circle

* Convert AMREX_ALWAYS_ASSERT to AMREX_ASSERT for particle bounds checking

* Match assertion macro change from ECP-WarpX#2939

* Fix indentation

* Disable shared memory charge deposition by default

* Ignore variable only used in assertion

* Add documentation of added input parameter warpx.do_shared_mem_charge_deposition

* Add comments as suggested by Remi

* Docs: Fix syntax issues in parameters.rst

* Convert error check to unconditional assertion as requested

* Make some arguments const to ease refactoring

* Finished DepositCurrent function

Ready to call the function from CurrentDeposition.H, but currently there
is only a dummy function there

* AMReX: Weekly Update

* Reset: `reduced_diags_single_precision`

* Reset: `background_mcc_dp_psp`

* Merged with develop. runs on mpi no gpu

* All funcs implemented. Compiles with bugs

* Fixed typo in CurrentDeposition

* Working on 2D version. there is bug

jz doesn't line up correctly

* Fixed 2d bug

* Removed some debugging lines

* Cleaning up comments

* Added an input param for threads per block

* Added a variable NS and START/STOP

* Added a region for kernel

* Not working on tilesize > 1 1 1

* Implemented Andrews new algo for max tilesize

* Reduce the amount of shared memory needed by re-using the same buffer for all three components

* Made default tilesize sort_bin_size LAST V1 COMMIT

* bugfix - don't add 0.0 cells back to global memory.

* Need to take abs before checking > 0.0

* Ran Whitespace Fixer As instructed

* Updated Comments

* change default tpb for current deposition to 128

* clean up comments

* quiet compiler warning

* remove unused variables

* refactor shared current depo code

* forgot to check in file

* fix cpu compilation

* fix uninitialized

* fix typo

* fix bad merge

* Fixed default tilesize bug

Previously had defualted shared_tilesize to sort_bin_size. This was
overwritting the shared_tilesize. Some scanning shows that sort_bin_size
isn't a very good default for tilesize anyways, so the new default is 1
1 1.

* changed shared tilesize default to 6 6 8

Decision based on scan over tilesizes and ppc by @atmyers and @kaplannp

* Put in switches for default tilesize 288 3d 144 2d

Tested correctness in 2 and 3 d

* Simplified parcticle contribution section

In accordance to @AlexanderSinn feedback, and tested RZ, 2D, 3D

* Cleanup tbox construction and depos->(depos+1)/2

in accordance to changes proposed by @AlexanderSinn. Tested on 3D 2D and
RZ

* Restored shared to previous version from 6acab48

Changes from before broke single precision

* Found new spot to benefit from (depos_order+1)/2

* Cleaned up sloppy comments

* Throw error on shared if no hip or cuda

This commit makes the assumption that if you use shared, you must be
using HIP or CUDA. This allows us to remove a bunch of macros that tried
to quietly revert to non shared if you didn't use HIP/CUDA, and we now
throw error if you try to run without HIP/CUDA

* More cleanup to compile and test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add cost to GPU clock conditional

Co-authored-by: Axel Huebl <[email protected]>

* Update Source/Particles/Deposition/CurrentDeposition.H

Co-authored-by: Axel Huebl <[email protected]>

* add cost to GPU clock conditional

Co-authored-by: Axel Huebl <[email protected]>

* whitespace fix

Co-authored-by: Axel Huebl <[email protected]>

* Updated tilesize docs, and change 1d/rz default

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Added docs for tpb

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Changed grow (depos_order+1)/2->depos_order

Turns out, above fails for shape 2

* Change default to non share, and add error check

throws errors if you try vay or esirkepov with shared, and defaults to
not using shared for all algos.

* update to use ablaster kernel timer

Compiles and runs, but at step 80 diverges from dev

---------

Co-authored-by: Phil Miller <[email protected]>
Co-authored-by: Phil Miller <[email protected]>
Co-authored-by: Tools <[email protected]>
Co-authored-by: kaplannp <[email protected]>
Co-authored-by: Axel Huebl <[email protected]>
Co-authored-by: kaplannp <[email protected]>
Co-authored-by: kaplannp <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants