-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loop macros #39
Merged
Merged
Loop macros #39
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Generate macros for loops over any combination of dimensions, which get pre-generated ranges so that each type of loop is parallelized over the shared MPI arrays with optimal load balance.
Only functional difference between the different 'regions' is in debugging code, but regions provide structure showing where synchronization calls are needed.
Now replaced by the new implementation in looping.jl.
If we are precompiling, presumably want an optimized, production build so always pass `-O3 --check-bounds=no` to the build process.
Parsing command line options in command_line_options.__init__() caused problems (command line arguments being ignored) when moment_kinetics was compiled into a static system image. [My guess is that __init__() was called too early in that case, before ARGS was set up properly.] Instead, define a getter function get_options(), which parses the command line arguments when it is called. This is slightly less efficient, but will only be called a few times, and should be more robust. Also allows changing the options, e.g. adding/removing "--long" from ARGS during a REPL session to change which tests are run.
Also test with 3 processes, as this is now supported by moment_kinetics.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR introduces the loop macros discussed in #35.
This new implementation of parallelization is more flexible - it allows any number of processes to be used, regardless of the number of species or spatial/velocity-space grid dimensions. The dimensions are divided up to give the 'best possible' load balance, with the restriction that each dimension is divided up independently (i.e. the ranges for inner loops are not allowed to depend on the indices of the outer loops).
There is a slight performance drop when
evolve_upar == false
. The previous implementation had a special case for looping just over ion species, which I have not implemented here to keep things as simple as possible, and assuming that the main target is with evolving upar and ppar anyway. This was used invpa_advection!()
, but the new implementation just tells some processes tocontinue
if they are handling neutrals, whenevolve_upar == false
.The implementation in
looping.jl
is a bit of a nightmare of code-generation, although it's only 375 lines. It was designed to be flexible enough that the only thing we ever have to change is the Tuple of dimensionsconst all_dimensions = (:s, :z, :vpa)
near the top. Sets of loop ranges and macros for loops over any combination of those dimensions are then automatically generated. So hopefully we do not have to fight with it often in future.Description (copied from README.md)
@s_z_loop is iz begin for izpa in 1:vpa.n f[ivpa,iz,is] = ... end end
The dimensions in the prefix before
_loop
give the dimensions that are looped over by the macro, the arguments beforebegin
are the names of the loop variables, in the same order as the dimensions in the prefix; the first dimension/loop-variable corresponds to the outermost nested loop, etc.@s_z_loop_s is begin some_setup(is) @s_z_loop_z iz begin @views do_something(f[:,iz,is]) end @s_z_loop_z iz begin @views do_something_else(f[:,iz,is]) end end
The dimensions in the prefix before
_loop_
again give the dimensions that are looped over in the nested loop. The dimension in the suffix after_loop_
indicates which particular dimension the macro loops over. The argument beforebegin
is the name of the loop variables.LoopRanges
struct in theRef
variableloop_ranges
(which is exported by thelooping
module). Occasionally it is useful to access the range directly. For example the range looped over by the macro@s_z_loop_s
isloop_ranges[].s_z_range_s
(same prefix/suffix meanings as the macro).[]
are needed becauseloop_ranges
is a reference to aLoopRanges
objectRef{LoopRanges}
(a bit like a pointer) - it allowsloop_ranges
to be aconst
variable, so its type is always known at compile time, but the actualLoopRanges
can be set/modified at run-time.@serial_region
macro. This is mostly useful for initialization or file I/O where performance is not critical. For example@serial_region begin # Do some initialization f .= 0.0 end
begin_s_z_region()
should be called before@s_z_loop
or@s_z_loop_*
is called, and after any@serial_region
or other type of@*_loop*
macro.begin_serial_region()
should be called before@serial_region
.begin_*_region()
functions call_block_synchronize()
, which callsMPI.Barrier()
. When all debugging is disabled they are equivalent toMPI.Barrier()
. Having different functions allow extra consistency checks to be done when debugging is enabled, seedebug_test/README.md
.Closes #35, closes #38.