-
Notifications
You must be signed in to change notification settings - Fork 9
Design
- separate data layout from interface
- make it easier to use smaller pieces of code
- make the code / physics easier to understand
- separate vertical and horizontal where possible:
- horizontal:
- will be spectral element
- use quad elements with LGL quadrature
- duplicate values at colocated nodes (supporting both DG and CG)
- either regular grid (LES box), cubed sphere (GCM) or potentially arbitrary conformal mesh (e.g. for ocean and land)
- support for loading an arbitrary 2D grid from standard mesh formats (GMSH)
- can be distributed
- vertical
- nodes will be vertical (i.e. radial on a sphere) aligned
- should support a variety of discretizations (e.g. spectral element, FV, staggered grids)
- support both duplicated or shared nodes for CG
- implicit solves in the vertical
- horizontal:
- Support for 2D horizontal surfaces (e.g. boundary conditions, barotropic modes in ocean models, etc.)
- Make interface more functional:
- i.e. users should write functions that receive and return immutable objects, rather than in-place modification
- Data backends for 3D/2D/column/pancake
- StructArray or roll our own?
- Unit tests + GPU
- Horizontal grid structure
- topology
- metrics
- DSS
- Horizontal operators
- horiz gradient
- horiz divergence
- broadcasting
- Basic examples
- Poisson equation / Heat equation
- regular 2D mesh
- irregular 2D mesh
- Wave equation
- Column grid structure
- 1D tests of column
- 3D tests of both combined
-
We only need to consider 2D topologies
- i.e. connectivity is the same for all elements in a stack
-
Aim for deep atmosphere
- optional support for shallow atmosphere shouldn't be too difficult: modify metric terms and Coriolis terms
-
Remainder model
- becomes a splitting of the vertical
- needs to only be vertical-only
-
column(values, i, j, h)
- view that can be indexed by
[k, v]
, returning a struct - used for everything vertical (divergence, gradient, integration, implicit solves, search up/down/bisect, interpolated view)
- for 3d fields, this will act like a function
- view that can be indexed by
-
pancake(values, k, v, h)
-
view that can be indexed by
[i,j]
, returning a struct -
face(pancake, face#)
=> view indexed by[i]
(for horz num fluxes) -
opposing(face)
will give the face opposing the element -
elements (read only)
- vizualization, VTK export
-
key assumptions:
- columns never access horizontal data other than gradients, etc, which can be computed before being passed in
- horizontal terms only act via derivatives (divergence / gradients / curl)
-
broadcasting (incl. 2D horz -> 3D fields), lazy variant
- reduction (integrate 3D -> 2D)
(some we will want, flexibility to add more)
-
GPU: thread model 1 thread per (i,j) within an element
-
CPU: ?
- 1 thread per pancake
- 1 thread per column
-
i,j horizontal node indices
-
k vertical node index
-
field: field index (to be figured out)
-
v: vertical element index
-
h: horizontal element index (i.e. uniquely identifies an element stack) (a) integer for arbitrary mesh - how to handle real vs ghost elements? (b) (hx,hy) for regular grid (LES) (c) (hx,hy,hface) for cubed sphere
-
How to handle (expensive) column physics on duplicated columns?
-
primary -> secondary sharing mechanism:
-
duplicate work
-
duplicate across processes
-
ultimately depends on threading model
-
How to handle columns that use different vertical staggering?
-
Interpolation:
- what is h?
- regular grid/cubed sphere we can compute a direct lookup
- arbitrary mesh build some quad tree
- what is v?
- linear lookup (ideally)
- compute coordinate in reference element
- barycentric interpolation
- weighted average of all nodes in an element
- if horizontal-only, can do it via a pancake
- only certain fields require communication (e.g. tendencies, grads in DG)
- can reuse communication buffers for efficiency
- load data directly from recv buffers
- data buffer
opposing face 4 possible faces * 2 orientations
for a ghost element has only one face store whole faces contiguously (duplicate shared nodes if necessary)
-
elemtoelem[f,h]
=> if > 0, real element, if <0 then its the ghost face # -
elemtoface[g,h]
=> if real element, 1:4 positive orienataion, -4:-1 flipped orientation; ghost face 1/-1
store columns contiguous, based on the logical order for the sender
- don't duplicate nodes unique numbering of columns on a given rank
- if positive, real column, use linear indexing c = NqNq(h-1) + Nq*(j-1) + i
- if < 0, ghost column, look up in the receive buffer
-
Data: a value stored at each node
-
Grid: geometric information
- physical location
- metric terms
- face normals (need for DG/boundaries); can be computed from metrics if required
- stores its information using a Data object
-
Topology (move out of Grid)
- connection between horizontal
- DG opposing horizontal faces and orientation
- CG which horizontal nodes are collacated (0 for internal, 1 for non-vertex faces, 2+ for vertices)
- communication / rank partitioning
- connection between horizontal
-
Field: Data + Grid
- map
- vertical reduce (integral)
- broadcast + lazy broadcast (e.g. combine 3D field, 2D field, scalar)
- grad, div, curl (split into horizontal/vertical)
- use a StructArray (or something that acts like an array of structs)
- main limitation: can only be something take spatial derivatives of
- requires scalar multiplication + addition
idea field element type can be one of:
- real scalars (floating point numbers)
- vector coefficient types (basically coeffients of a vector space)
CartesianVector(x1,x2,x3)
SphericalCartesian(u,v,w)
-
Contravariant
/Covariant
wrt the reference element -
Contravariant1
/Contravariant2
/Contravariant3
projections onto contravariant - to convert between these, you need geometry information
SphericalCartesian(CartesianVector(1,2,3), geom)
- tensors?
- composite object (tuples or named tuples of the above)
- support for
Tuples
/NamedTuples
- add support for custom structs?
- should not support vector coefficients to avoid e.g. adding two SphericalCartesian objects at different locations
- support for
Can we compose kernels by combining multiple operations, such as broadcasting and tensor product operations.
-
gradient:
- extract/ compute X to shared memory
- synchronize
- tensor product
A1 = (D ⊗ I) * X A2 = (I ⊗ D) * X
- combine
∂ξ∂x .* Covariant(A1,A2)
-
strong divergence
- map to shared memory
C1 = J .* contravariant1(V) C2 = J .* contravariant1(V)
- synchronize
- tensor on each matrix
D1 = (D ⊗ I) * C1 D2 = (I ⊗ D) * C2
- combine
(D1 .+ D2) ./ J
- map to shared memory
-
filtering / interpolation
- compute/load to shared memory
- synchronize
- tensor dimension 1 to shared meory
A = (F ⊗ I) * X
- synchronize
- tensor dimension 2
Y = (I ⊗ F) * A
-
add (optional) surface geometry data structure to mesh
- IFH data structure: separate internal/boundary
- DG we need at all faces
- CG we only need at boundary faces
-
horizontal boundary conditions for CG/DG
- effectively we need a way to prescribe boundary fluxes
-
Additional operators
- Curl
- Hyperdiffusion
- scalars, vectors, tensors
-
high-level CG/DG divergence function: would specify
- input fields
- flux function
- numerical flux function for DG
- boundary fluxes
- additional options
- integration mesh for overintegration
- hyperdiffusion
-
Mesh warping (dependent on 1)
- flat horizontal to check metric terms
- e.g. apply a sinusoid to x1
- Cubed-sphere mesh
- equiangular
- S2 geometry
-
Vertical column standalone (dependent 1) (Charlie, Dennis)
- add Column data structure VF to DataLayouts
- finite difference mesh
- heights of edges
- heights of centers (can be computed)
- number of ghost points top & bottom
- handle flipped indexing ocean/land top down, atmos bottom up
- fields stored at centers or faces
- stencils for interpolation / gradients
- diffusion example
- aim for Ekman layer, stable boundary layer (Ignacio, Yair)
-
Boundary conditions for the vertical
- Identify people (Charlie, ?)
- Higher-level interface
- Identify required functionality
-
Integrated vertical/horizontal model
- hybrid mesh
- implicit solver for multiple columns
-
Performance and Parallelism
-
Profiling/track allocations
-
CPU threading
- may need to disable GC
- volume: can @threads for h = 1:Nh
- faces: will need to avoid race conditions
- partition into non-overlapping face groups
- duplicate work (matching current ClimateMachine model)
-
GPU support
- CG Bicleky jet (Sriharsha)
-
Distributed Memory / MPI
-
what is the object that determines the distribution mechanism
- topology at the lowest level
-
only need to worry about horizontal
-
Ghost exchange
- DG ghost faces
- CG ghost columns
- (optional) Reusing shared buffers (e.g. for multistage RK)
- (optional) Index directly into receive buffer
-
Allreduce to compute norms
-
Broadcast
-
Test cases to add
- vector invariant bickley jet
- wave / advection-diffusion
- isentropic vortex
- non Cartesian vectors
Initial GPU kernel development:
- Get bickley jet working with CG
- Need specalized operators for DSS, divergence kernels
- Get bickley jet working with DG
- Need specialized operators for DG weak divergence, numerical fluxes