NeighbourAwareBCs

Neighbour-aware boundary conditions

For Domain Edge sites, LB is done in PreSend
For Internal sites, LB is done in PreReceive
After Receive, F values streamed from neighbours are poked in to the correct places, based on pre-calculated streaming targets.
For Some BCs (e.g. FInterpolation) some post-fixup is done in PostReceive
Some BCs need data for their neighbouring sites.
They are happy to use the current Fold data for local sites.
They use this to work out the values to stream.

What if, for BCs which need data on neighbouring sites, the neighbouring sites are on different cores?
We might think we can use the same comms as the communication of the LB streaming results to get this info, but we can't as we need the info to get those results.
We could use the info from a time-step ago, but this will be the Fold from the previous time-step, i.e. two steps ago.
So we need a second communication phase, following the completion of edge sites, including poking in the received Fs, to communicate the required finished information.
The current solution exploits a special feature of simple LB, that the communication needed is only the sending of some streaming Fs across boundaries, to simplify comms, as follows:
- Calculate based on self
- Send some streaming components to neighbour
- Update self with streamed component values
Note that the streaming components to neighbour, do not depend on state of neighbour, only on self.
For more complicated BCs, where the values to be streamed depend on the neighbour, we need:
- Send current state to neighbours that need it
- Calculate based on self and neighbour
- Send some streaming components to neighbour
- Update self with streamed component values

Could we, by rearranging the way we do things, still do this with one communication step?
Consider a 1-d model, with an Fleft, Fright, and Fcentre, Fl, Fr, Fc. Three sites, on each of three cores, giving quantities F1l to F3r.
At each step, Fc, Fr, and Fl are calculated as a function, which depends on all the Fs of all neigbouring sites (nine arguments, three outputs)
F(i+1)l and Fir are then appropriately streamed following this calculation.
We cannot send the next-turn-going-to-be value of F2l, to be received from site 3, to site 1 from site 2, at the same time as we send F2l to F1, as we do not have it yet.
So we cannot make available, to Core 1, the values it needs in a single communication step.
We could give the old value of F2l if we wanted, but that introduces a two-step-delay error.

We therefore must come up with a two-communication step solution.

Here, we precalculate which if any aspects of the F-field, including macroscopic moments, will be needed by neighbouring cores.
- To do this, we have a method on site LB stream-and-collide objects, which yields the information about needs from a given site.
- We then iterate over boundary sites to compile the list of needs.
Then, each time step, we share these in a phase which precedes the LB phase.
Initially, there is no overlap
- Later, we can split the boundary sites by those which do and do not need neighbour info, and calc the neighbour info ones as overlap to the sharing of info.
This is a nice and simple design at time-step time.
The complexity resides in the precalculation of the array of which elements need to be shared.
One can make a simple robust design to start with, where, if a given BC is selected, all neighbour Fs are shared, which is massive overkill, but will work.
Then, using this to produce good test case, one can refine the exchange list to be just what is needed.
Communication of values in the macroscopic cache, as well as Fs, is possible in this design, and can be implemented as a secondary phase.

First, implement MultiCommsPerTimestep
New class: NeighbouringDataManager
- Constructor takes an instantiated LatticeData and LB objects, determines exchanges necessary.
- Once new demanded information needed, space for the Fs for these is stored in this NeighbouringDataManager
  - Look up of site data needs to proceed as if not (local site index) -- then check in the neighbouring data manager.
  - !NeighbouringDataManager stores its site data via a map.
  - Yields new class !NeighbouringSite, which inherits site, but gets its data from the !NeighbouringDataManager instead of the !LatticeData
- Exchange of needs takes place in this constructor.
- This class is an iterated actor registered to phase one.
- Child struct RequiredFieldInformation
  - bool requiresDensity
  - vector3D requiresVelocity
  - vector requiresF
  - etc.
  - has an operator or, when assembling requirements across neighbouring sites.
- Stores a map<proc_t,map <site_t, RequiredFieldInformation> > indicating which fields are required on which site.
- Implements RequestComms etc to transfer the required information.
New methods to each streamer:
- GetRequiredFieldInformation(direction) : returns the struct of bools used by the NeighbouringDataManager, giving what it requires from the neighbour in that direction
  - Default implementation just returns the false object.
  - Used by constructor of !NeighbouringDataManager

Here, instead of precalculating an exchange list, the important BCs manage their own comms to receive the data they need before doing LB.
That is, some BCs implement multiple phases worth of comms.
Localises calculation of exchange list into logic inside BCs.
But, who is doing the sending of this data into these BCs?
This must be an F- and macroscopic- value provider, which receives pull requests for information during setup.
At simulation time:
- Phase one: F-provider sends to expecting BCs, BCs receive.
- Phase two: BCs do LB
At setup time, instead of a global calculation of needed data, the BCs send pull requests to the F-provider, which caches these, so it knows what to send during the simulation.