Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loop macros #39

Merged
merged 33 commits into from
Jan 12, 2022
Merged

Loop macros #39

merged 33 commits into from
Jan 12, 2022

Conversation

johnomotani
Copy link
Collaborator

@johnomotani johnomotani commented Jan 10, 2022

This PR introduces the loop macros discussed in #35.

This new implementation of parallelization is more flexible - it allows any number of processes to be used, regardless of the number of species or spatial/velocity-space grid dimensions. The dimensions are divided up to give the 'best possible' load balance, with the restriction that each dimension is divided up independently (i.e. the ranges for inner loops are not allowed to depend on the indices of the outer loops).

There is a slight performance drop when evolve_upar == false. The previous implementation had a special case for looping just over ion species, which I have not implemented here to keep things as simple as possible, and assuming that the main target is with evolving upar and ppar anyway. This was used in vpa_advection!(), but the new implementation just tells some processes to continue if they are handling neutrals, when evolve_upar == false.

The implementation in looping.jl is a bit of a nightmare of code-generation, although it's only 375 lines. It was designed to be flexible enough that the only thing we ever have to change is the Tuple of dimensions const all_dimensions = (:s, :z, :vpa) near the top. Sets of loop ranges and macros for loops over any combination of those dimensions are then automatically generated. So hopefully we do not have to fight with it often in future.

Description (copied from README.md)

  • There are two types of loop macros:
    • 1. If there is just a single nested loop, then for example
      @s_z_loop is iz begin for izpa in 1:vpa.n f[ivpa,iz,is] = ... end end
      The dimensions in the prefix before _loop give the dimensions that are looped over by the macro, the arguments before begin are the names of the loop variables, in the same order as the dimensions in the prefix; the first dimension/loop-variable corresponds to the outermost nested loop, etc.
    • 2. For more complex loops, a separate macro can be used for each level, for example
      @s_z_loop_s is begin some_setup(is) @s_z_loop_z iz begin @views do_something(f[:,iz,is]) end @s_z_loop_z iz begin @views do_something_else(f[:,iz,is]) end end
      The dimensions in the prefix before _loop_ again give the dimensions that are looped over in the nested loop. The dimension in the suffix after _loop_ indicates which particular dimension the macro loops over. The argument before begin is the name of the loop variables.
  • The ranges used are stored in a LoopRanges struct in the Ref variable loop_ranges (which is exported by the looping module). Occasionally it is useful to access the range directly. For example the range looped over by the macro @s_z_loop_s is loop_ranges[].s_z_range_s (same prefix/suffix meanings as the macro).
    • The square brackets [] are needed because loop_ranges is a reference to a LoopRanges object Ref{LoopRanges} (a bit like a pointer) - it allows loop_ranges to be a const variable, so its type is always known at compile time, but the actual LoopRanges can be set/modified at run-time.
  • It is also possible to run a block of code in serial (on just the rank-0 member of each block of processes) by wrapping it in a @serial_region macro. This is mostly useful for initialization or file I/O where performance is not critical. For example
    @serial_region begin # Do some initialization f .= 0.0 end
  • In any loops with the same prefix (whether type 1 or type 2) the same points belong to each process, so several loops can be executed without synchronizing the different processes. It is (mostly) only when changing the 'type' of loop (i.e. which dimensions it loops over) that synchronization is necessary, or when changing from 'serial region(s)' to parallel loops. To aid clarity and to allow some debugging routines to be added, the synchronization is done with functions labelled with the loop type. For example begin_s_z_region() should be called before @s_z_loop or @s_z_loop_* is called, and after any @serial_region or other type of @*_loop* macro. begin_serial_region() should be called before @serial_region.
  • Internally, the begin_*_region() functions call _block_synchronize(), which calls MPI.Barrier(). When all debugging is disabled they are equivalent to MPI.Barrier(). Having different functions allow extra consistency checks to be done when debugging is enabled, see debug_test/README.md.

Closes #35, closes #38.

Generate macros for loops over any combination of dimensions, which get
pre-generated ranges so that each type of loop is parallelized over the
shared MPI arrays with optimal load balance.
Only functional difference between the different 'regions' is in
debugging code, but regions provide structure showing where
synchronization calls are needed.
Now replaced by the new implementation in looping.jl.
If we are precompiling, presumably want an optimized, production build
so always pass `-O3 --check-bounds=no` to the build process.
Parsing command line options in command_line_options.__init__() caused
problems (command line arguments being ignored) when moment_kinetics was
compiled into a static system image. [My guess is that __init__() was
called too early in that case, before ARGS was set up properly.]

Instead, define a getter function get_options(), which parses the
command line arguments when it is called. This is slightly less
efficient, but will only be called a few times, and should be more
robust. Also allows changing the options, e.g. adding/removing "--long"
from ARGS during a REPL session to change which tests are run.
Also test with 3 processes, as this is now supported by moment_kinetics.
@johnomotani johnomotani added the enhancement New feature or request label Jan 10, 2022
@mabarnes mabarnes merged commit b6f2082 into master Jan 12, 2022
@johnomotani johnomotani deleted the loop-macros branch January 26, 2022 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Syntax for loop macros Simplify/future-proof parallel loops
2 participants