-
Notifications
You must be signed in to change notification settings - Fork 12
Decomposition
Pre-Processing Data Structures and Algorithms In HemeLB Introduction HemeLB has a complex task in loading the structure of the sparse domain in which the simulation will take place, distributing this information to each cooperating process, and in deciding which processes will be responsible for which elements of the simulation domain. This pre-processing stage is discussed here, focusing on the data-structures and algorithms used.
The input to this process is a HemeLB geometry file, describing the domain. The output is a memory structure on each process describing the simulation sites that each process is responsible for, and, on each process, the identity of the processes who manage neighbouring sites.
The purpose of this document is not a detailed description which can be engineered against, but to specify the state of the art of HemeLB pre-processing, as a basis for analysis of potential improvements towards enabling HemeLB for the Exascale. A sister document provides a similar overview of post-processing in HemeLB. Geometries In this section, we introduce the semantic model used by HemeLB to describe the simulation domain, a geometry. Specific syntactic realisations as data files and memory structures are described in later sections. Site The basic unit of HemeLB geometry is the site, a location where Lattice-Boltzmann simulation will take place. The information for a site consists of its location in the domain (a coordinate triple) and an indication of whether it is a fluid site which should be simulated or a solid site outside the simulation domain. Link Each site also includes content regarding the nature of the link to each neighbouring site. Which sites are considered neighbours varies, and will be discussed below. Link information consists of whether the link crosses outside the simulation domain, how far along that link the solid/fluid boundary lies, and, if the link crosses outside the simulation via an open boundary (fluid inlet our outle “iolet”), the identifier of the specific iolet. Blocks A block is an 8x8x8 cube of sites. File Model We now turn to an overview of the syntactic realisation of the geometry used to represent it as a file on disk. This is an XDR binary representation. Engineering details can be found in HemeLB developer documentation. Header The file begins with a header. This first provides filetype and version information, and then gives the size of the problem domain in blocks, and transformation information relating the block/site coordinate system to the spatial domain. Blocks There then follows a table of information describing each block, giving the number of fluid sites in the block, and the compressed and uncompressed sizes of the block data, which varies from block to block. Data for all blocks in the cuboid domain is given, blocks being striped with z changing most frequently, thus block coordinates can be deduced. These vary, as we will see.
If a block contains no fluid sites, no data is recorded for it, and this is reflected as a zero in the header table. Otherwise, for each site, solid or fluid, site data is given, compressed using zlib. Site and Links An unsigned integer is recorded, indicating whether the site is solid (0) or fluid (1). If a site is solid, no further data is given. Otherwise, a representation of link data is given for each link to all neighbours, using a 26-member 3-D Moore Neighbourhood. Geometry class model HemeLB contains a set of classes used to represent the geometry information during parsing and analysis of the geometry file. These classes (GeometrySiteLink, GeometrySite, GeometryBlock, and Geometry) provide a simple deserialised representation of the geometry file, each using STL vectors of the contained elements. (A geometry contains a vector of blocks, a block one of sites, and a site one of links.)
In this representation, the model is only partially sparse: either a block contains an empty site vector, if it is empty of fluid sites, or it contains a site vector with a site object for all 83 sites. Similarly, a fluid site contains either 25 link objects, or none. Site and block coordinates can thus be inferred from their location in the vector. On all processes, the full set of block objects is instantiated, but a block will contain no sites on processes where it is not needed.
In addition, the geometry site model also contains, for each site, the rank of the process to which that site is allocated by the domain decomposition. Lattice Data Class Model The Geometry class model, being a simple object representation of the geometry, is not appropriate to efficient computation. HemeLB therefore uses a second representation of the same geometry information for the Lattice-Boltzmann and visualisation components. This representation uses a flyweight pattern: a LatticeData object contains vectors, separately, of each piece of data that might be stored about a site. A Site object contains a single data field, an index into these arrays, and a series of access methods, providing an object-oriented reflection of this flat data. Only the links in the Boltzmann lattice subset of the Moore neighbourhood selected for this build at compile time are used.
This representation is not used in the pre-processing stage – the LatticeData constructor takes a Geometry object as a parameter, and build this representation. This representation will therefore be discussed further in the post-processing sister document. We will not discuss how the Geometry Class Model is transformed into the Lattice Data Class Model.
Pre-Processing Algorithms
Overview
Having provided an overview of the data structures used in HemeLB we now turn to the algorithms used to build the Geometry Class Model from the File Model. The simplest parsing of the file format into the basic objects is straightforward, and we will not discuss it in detail. However, several more complex tasks must be addressed: we must efficiently load a large file, in parallel, into multiple processes (parallel IO) we must determine which sites belong on which processes (domain decomposition), and we must ensure that the sites and block information is made available to those processes which need it (information distribution).
The process is broken down as follows: we make an arbitrary initial decomposition of blocks to each of several processes. Then, we use a subset of processes to read the blocks, and distribute the data to the processes which need them. Next, we re-decompose using the now loaded site information, and finally, we re-distribute the data according to the new decomposition.
Domain Decomposition 1: Initial Decomposition
Domain decomposition suffers from a “bootstrap problem”: we cannot efficiently load and analyse the geometry without some distribution of the geometry across parallel processes, and we cannot determine an appropriate decomposition of the geometry without loading and analysing it.
We therefore make an initial decomposition which does not require detailed information about the geometry, only an indication of size of the cuboid bounding box of blocks, and whether a given block is wholly solid. This information can be obtained using only the headers of the geometry file.
This initial decomposition is carried out by the BasicDecomposition class, and uses a simple domain-growing algorithm. Improvements to this algorithm are unlikely to be of significant benefit, as it is only used to get some starting parallelism for block reading and is thrown away after being used as a seed for the geometry-aware second decomposition.
This algorithm is carried out on all processes, with no parallelism: the same answer is obtained on all processes. For each process, we start a domain, and grow it by adding neighbouring blocks, using neighbours based on the Lattice-Boltzmann lattice subset of the Moore neighbourhood. Once the domain contains enough blocks to give that process a fair share, we start a domain for the next process. Blocks which contain no fluid sites are not assigned, but are treated as barriers to the growing domain. It may therefore be necessary to start more than one growth region to achieve the required number of blocks per process.
Parallel IO: Reading Cores
We assume that the geometry file resides on a parallel file system, with the file capable of being access in pieces by multiple processes. Rather than load from the file, the blocks and sites assigned to it by the domain decomposition, we choose to limit reading from files to a limited subset of cores, known as Reading Cores. Information from the file appropriate to each process is then passed to the relevant processes through appropriate messages. This approach has been seen to allow parallelism in the reading of large files, while at the same time avoiding slow-down due to excessive numbers of processes contesting for the file. The optimum proportion of total processes used to read the file depends on the file system being used and many other factors, can be set at compile time for HemeLB, and is an ideal target for autotuning, which we do not yet do.
Since we use a subset of processes to read the file (see below) these processes must learn which blocks contain sites used on which processes, so that they can send this information to those processes after it has been read. The class Needs handles this through a series of MPI Gathers to each reading core, gathering first the number of blocks needed on each process, then the identities of those blocks. Then, as blocks are read on each reading core, the compressed data is sent from reading process to needing process.
Domain Decomposition 2: Optimised Decomposition
Having loaded a full description of the geometry, we can now make a proper domain-decomposition. We use the Parmetis library to do this. The resulting decomposition is on the level of individual sites: sites from a single block could be allocated to multiple processes. The task of HemeLB here is to transform the Geometry Class Model of the geometry into a graph-based representation suitable for use by Parmetis. The Parmetis library used for the optimised domain decomposition requires its own representation of the geometry, using a different semantic model, representing the geometry as a set of nodes and arcs.
In this data structure, not represented as classes but as a series of arrays created in order to call Parmetis, LB sites become nodes, and the links become arcs. Only links in the Lattice-Boltzmann lattice subset are used. Each process is provided only part of the graph, based on the blocks assigned to it by the initial decomposition. Parmetis uses global site IDs to reconcile this data between processes. In preparing to call Parmetis, therefore, we must traverse all sites and links in the blocks assigned to a given process, and, where a link joins two fluid sites, include it in the arrays used to call Parmetis.
Parmetis returns its results in a partial form, returning to each process, the new processor rank for each of the sites provided to it on that process on input. Where this differs from the initial assignment, therefore, HemeLB must inform the new assignee of that new assignment. These changes of assignment between the initial and optimised decompositions are called Moves. Following the call to Parmetis, therefore, a series of MPI calls (all to all and point to point) is used to share this information, and reassign the sites to appropriate places.
Following this, it is then necessary to re-load the geometry information for sites and blocks following the new assignment. Potentially, this data could be transferred between blocks. Instead, for simplicity, the whole data-file is re-parsed using the same code as following the initial decomposition, repeating the Needs sharing process, the reading on the Reading Cores, and the transfer of compressed data from reading process to needing process.