Skip to content

Commit

Permalink
ENH: Configurable Precision (#103)
Browse files Browse the repository at this point in the history
* User specified precision parameter

* Add env setup

* Update readme

* More robust linalg tests

* Add tests for surface

* Make scripts precision agnostic

* Add more rigor to all tests
  • Loading branch information
skailasa authored Jul 27, 2021
1 parent b8051e2 commit 4d20e9c
Show file tree
Hide file tree
Showing 17 changed files with 1,077 additions and 684 deletions.
22 changes: 22 additions & 0 deletions .env
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#!/bin/bash

# Numba Threading Layer Configuration

## (Intel CPUs Only) Sets lifetime of OpenMP threads to 0ms
## As computation contains large serial (Python) components
export KMP_BLOCKTIME=0

## Limit number of threads created by BLAS/LAPACK functions
## Called by Numpy
export OMP_NUM_THREADS=1

## Define 'places' at which threads are assigned
export OMP_PLACES=cores

## Makes thread assignment go succesively through available
## places. In our case, through each core.
export OMP_PROC_BIND=close

## Select OpenMP as threading layer for Numba, the uniformity
## of FMM operators makes it preferable to TBB
export NUMBA_THREADING_LAYER='omp'
117 changes: 73 additions & 44 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,7 @@ PyExaFMM

[![Anaconda-Server Badge](https://img.shields.io/conda/v/skailasa/pyexafmm.svg)](https://anaconda.org/skailasa/pyexafmm) [![Anaconda-Server Badge](https://anaconda.org/skailasa/pyexafmm/badges/latest_release_date.svg)](https://anaconda.org/skailasa/pyexafmm) [![Anaconda-Server Badge](https://anaconda.org/skailasa/pyexafmm/badges/platforms.svg)](https://anaconda.org/skailasa/pyexafmm)

PyExaFMM is an adaptive particle kernel-independent FMM based on [1], written in pure Python with some extensions. Representing a compromise between portability, ease of use, and performance. Optimisations are currently implemented using Numba.

The goal of PyExaFMM is to develop a highly performant implementation of the adaptive particle FMM written in Python, as the utility of FMM algorithms are hindered by their relatively complex implementation, especially for achieving high-performance.

## Performance

PyExaFMM compares favourably on realistic problem sizes with naive direct evaluation with a numba-fied kernel. The figure below provides a benchmark for the Laplace kernel evaluated on uniformly distributed points.

<img src="static/performance.png" alt="Laplace" width="800">

Here we use an order 5 multipole expansion, an order 6 local expansion, and constrain leaf nodes to contain a maximum of 100 particles. The target rank in our M2L operator compression is kept at 1. The benchmark was run on an Intel i7-9750H processor.
PyExaFMM is an adaptive particle kernel-independent FMM based on [1], written in pure Python with some extensions. Representing a compromise between portability, maintainability, and performance.

## Install

Expand Down Expand Up @@ -47,16 +37,18 @@ conda install --use-local pyexafmm
python setup.py develop
```

## Configure
## Configuration

### Experimental Data

After installation, you must precompute and cache the FMM operators for your dataset. Most of these are calculated using the techniques in [1], notably M2M and L2L matrices can be computed for a single parent node and its children, and scaled in a simple fashion for the kernels implemented by PyExaFMM. For the M2L operators, we introduce a randomised SVD compression [2], to avoid storing and applying potentially very large dense matrices.
After installation, you must pre-compute and cache the FMM operators and octree for your dataset. Most of these are calculated using the techniques in [1], notably M2M and L2L matrices can be computed for a single parent node and its children, and scaled in a simple fashion for the kernels implemented by PyExaFMM. We sparsify the M2L operators using a technique based on randomized SVD compression [2] and transfer vectors [3].

This is done via a `config.json` file, PyExaFMM will look for this in your **current working directory**, which allows you to configure experimental parameters, as well as choose a kernel, and computational backend, which optimise the operator methods of the FMM using different approaches.

```json
{
"experiment": "test",
"npoints": 10000,
"npoints": 1000000,
"data_type": "random",
"order_equivalent": 5,
"order_check": 5,
Expand All @@ -66,23 +58,25 @@ This is done via a `config.json` file, PyExaFMM will look for this in your **cur
"alpha_outer": 2.9,
"max_level": 10,
"max_points": 150,
"target_rank": 10,
"target_rank": 20,
"precision": "single"
}
```

|Parameter | Description |
|-------------- |----------------------------------------------- |
| `experiment` | Experiment name, used to label HDF5 database |
| `npoints` | Number of points to generate in test data. |
| `experiment` | Experiment name, used to label HDF5 database |
| `npoints` | Number of points to generate in test data using demo functions. |
| `data_type` | Type of test data to generate. |
| `order_equivalent`| Order of multipole expansions, same as discretisation of equivalent surface. |
| `order_check` | Order of local expansions, same as discretisation of check surface. |
| `order_equivalent`| Order of multipole expansions, same as discretization of equivalent surface. |
| `order_check` | Order of local expansions, same as discretization of check surface. |
| `kernel` | Kernel function to use, currently only supports laplace. |
| `backend` | Compute backend to use, currently only supports numba. |
| `backend` | Compute backend to use, currently only supports Numba. |
| `alpha_inner` | Relative size of inner surface's radius. |
| `alpha_outer` | Relative size of outer surface's radius. |
| `max_level` | Depth of octree to use in simulations. |
| `target_rank` | Target rank in low-rank compression of M2L matrix. |
| `precision` | Experimental precision, 'single' or 'double'. |

PyExaFMM provides some simple test-data generation functions, which can be configured for. However, to use your own data, simply create a HDF5 file, with the same name as `experiment` in your configuration file, with the following group hierarchy,

Expand All @@ -93,40 +87,34 @@ particle_data/
|_ targets/
```

where `sources` and `targets` are the coordinates of your source and target particles respectivley, of shape `(nsources/ntargets, 3)`, and source densities is of shape `(nsources, 1)`.
where `sources` and `targets` are the coordinates of your source and target particles resp., of shape `(nsources/ntargets, 3)`, and source densities is of shape `(nsources, 1)`.

The CLI workflow is as follows,

```bash

# Generate test data (optional)
fmm generate-test-data
fmm generate-test-data -c config.json

# Run operator pre-computations
fmm compute-operators
# Run operator and tree pre-computations
fmm compute-operators -c config.json
```

Once this is done, you'll be left with a `.hdf5` database of precomputed parametrisations, with the same name as your specified `experiment` parameter from your `config.json`. If you've used your own data, then the operator precomputations will be written into the same HDF5 file.
Once this is done, you'll be left with a `.hdf5` database of precomputed parametrization, with the same name as your specified `experiment` parameter from your `config.json`. If you've used your own data, then the operator pre-computations will be written into the same HDF5 file.

Finally, ensure that the following MKL environment variables are set,
### Configure Threading Layer

```bash
export MKL_THREADING_LAYER=tbb
export MKL_NUM_THREADS=1
```
We optimize PyExaFMM for an OpenMP backend as the Numba threading layer. To avoid oversubscription due to nested parallelism, created when calling Numpy/Scipy functions from within OpenMP threads, we restrict internal Numpy thread pools to contain at most a single thread.

This is to ensure that there is no over-subscription of threads from calling BLAS routines on top of Numba. See [this discussion](https://github.com/numba/numba/issues/6637#issuecomment-760460620) for more information. TBB should be installed alongside Numba, however if this isn't the case, it can be installed separately,
Optimum parameters (for Intel CPUs) are provided in the `.env` file, and set with,

```bash
conda install -c conda-forge tbb
# Set threading environment variables
source .env
```

Now you are ready to start programming with PyExaFMM.


## Usage

Example usage of the API presented by the `Fmm` class is as follows:
The `Fmm` class acts as the API for PyExaFMM. Example usage patterns are provided below:

```python
from fmm import Fmm
Expand All @@ -140,14 +128,53 @@ e = Fmm()
# Run FMM algorithm
e.run()

# Access target potentials at each node via an index pointer
# Find index of a given node in the complete octree
# Access targets, sources, and computed potentials/potential gradients at
# a given leaf
leaf = e.leaves[42]

## Dictionary mapping leaf key to a pointer defining alignment
leaf_idx = e.key_to_leaf_index[leaf]

## Targets at leaf
leaf_targets = e.targets[
e.target_index_pointer[leaf_idx]:e.target_index_pointer[leaf_idx+1]
]

## Sources at leaf
leaf_sources = e.sources[
e.source_index_pointer[leaf_idx]:e.source_index_pointer[leaf_idx+1]
]

## Source densities at leaf
leaf_source_densities = e.sources[
e.source_index_pointer[leaf_idx]:e.source_index_pointer[leaf_idx+1]
]

## 4-vectors of target potentials/potential gradients aligned with 'leaf_targets'
leaf_potentials = e.target_potentials[
e.target_index_pointer[leaf_idx]:e.target_index_pointer[leaf_idx+1]
]

# Access multipole/local expansions at a given tree node
key = e.complete[42]
idx = e.key_to_index[key]
e.target_potentials[e.target_index_pointer[idx]:e.target_index_pointer[idx+1]]

# These correspond to the the similarly indexed targets
e.targets[e.target_index_pointer[idx]:e.target_index_pointer[idx+1]]
## Dictionary mapping a given key to a pointer defining alignment
key_idx = e.key_to_index[key]

## Multipole expansions defined by equivalent surface
multipole_index = key_index*e.nequivalent_points
multipole_expansion = e.multipole_expansions[
multipole_index:multipole_index+e.nequivalent_points
]

## Local expansions defined by check surface
local_index = key_index*e.ncheck_points
local_expansion = e.local_expansions[
local_index:local_index+e.ncheck_points
]

# Clear potentials, gradients, and expansions to re-run experiment
e.clear()
```


Expand All @@ -162,11 +189,13 @@ fmm [OPTIONS] COMMAND [ARGS]
| `compute-operators` | Run operator pre-computations | `-c <config_filename>` |
| `generate-test-data` | Generate `npoints` sources & targets | `-c <config_filename>` |

The option to specify a custom config filename `-c` overrides the PyExaFMM default to search for a file named `config.json` in your current working directory to parameterise precomputations.
The option to specify a custom config filename `-c` overrides the PyExaFMM default to search for a file named `config.json` in your current working directory to parameterize pre-computations.


## References

[1] Ying, L., Biros, G., & Zorin, D. (2004). A kernel-independent adaptive fast multipole algorithm in two and three dimensions. Journal of Computational Physics, 196(2), 591-626.

[2] Halko, N., Martinsson, P. G., & Tropp, J. A. (2011). Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review, 53(2), 217-288.

[3] Fong, W., & Darve, E. (2009). The black-box fast multipole method. Journal of Computational Physics, 228(23), 8712-8725.
Loading

0 comments on commit 4d20e9c

Please sign in to comment.