ENH: Configurable Precision (#103)

* User specified precision parameter * Add env setup * Update readme * More robust linalg tests * Add tests for surface * Make scripts precision agnostic * Add more rigor to all tests
exafmm · Jul 27, 2021 · 4d20e9c · 4d20e9c
1 parent b8051e2
commit 4d20e9c
Show file tree

Hide file tree

Showing 17 changed files with 1,077 additions and 684 deletions.
diff --git a/.env b/.env
@@ -0,0 +1,22 @@
+#!/bin/bash
+
+# Numba Threading Layer Configuration
+
+## (Intel CPUs Only) Sets lifetime of OpenMP threads to 0ms
+## As computation contains large serial (Python) components
+export KMP_BLOCKTIME=0
+
+## Limit number of threads created by BLAS/LAPACK functions
+## Called by Numpy
+export OMP_NUM_THREADS=1
+
+## Define 'places' at which threads are assigned
+export OMP_PLACES=cores
+
+## Makes thread assignment go succesively through available
+## places. In our case, through each core.
+export OMP_PROC_BIND=close
+
+## Select OpenMP as threading layer for Numba, the uniformity
+## of FMM operators makes it preferable to TBB
+export NUMBA_THREADING_LAYER='omp'
diff --git a/README.md b/README.md
@@ -4,17 +4,7 @@ PyExaFMM
 
 [![Anaconda-Server Badge](https://img.shields.io/conda/v/skailasa/pyexafmm.svg)](https://anaconda.org/skailasa/pyexafmm) [![Anaconda-Server Badge](https://anaconda.org/skailasa/pyexafmm/badges/latest_release_date.svg)](https://anaconda.org/skailasa/pyexafmm) [![Anaconda-Server Badge](https://anaconda.org/skailasa/pyexafmm/badges/platforms.svg)](https://anaconda.org/skailasa/pyexafmm)
 
-PyExaFMM is an adaptive particle kernel-independent FMM based on [1], written in pure Python with some extensions. Representing a compromise between portability, ease of use, and performance. Optimisations are currently implemented using Numba.
-
-The goal of PyExaFMM is to develop a highly performant implementation of the adaptive particle FMM written in Python, as the utility of FMM algorithms are hindered by their relatively complex implementation, especially for achieving high-performance.
-
-## Performance
-
-PyExaFMM compares favourably on realistic problem sizes with naive direct evaluation with a numba-fied kernel. The figure below provides a benchmark for the Laplace kernel evaluated on uniformly distributed points.
-
-<img src="static/performance.png" alt="Laplace" width="800">
-
-Here we use an order 5 multipole expansion, an order 6 local expansion, and constrain leaf nodes to contain a maximum of 100 particles. The target rank in our M2L operator compression is kept at 1. The benchmark was run on an Intel i7-9750H processor.
+PyExaFMM is an adaptive particle kernel-independent FMM based on [1], written in pure Python with some extensions. Representing a compromise between portability, maintainability, and performance.
 
 ## Install
 
@@ -47,16 +37,18 @@ conda install --use-local pyexafmm
 python setup.py develop
 ```
 
-## Configure
+## Configuration
+
+### Experimental Data
 
-After installation, you must precompute and cache the FMM operators for your dataset. Most of these are calculated using the techniques in [1], notably M2M and L2L matrices can be computed for a single parent node and its children, and scaled in a simple fashion for the kernels implemented by PyExaFMM. For the M2L operators, we introduce a randomised SVD compression [2], to avoid storing and applying potentially very large dense matrices.
+After installation, you must pre-compute and cache the FMM operators and octree for your dataset. Most of these are calculated using the techniques in [1], notably M2M and L2L matrices can be computed for a single parent node and its children, and scaled in a simple fashion for the kernels implemented by PyExaFMM. We sparsify the M2L operators using a technique based on randomized SVD compression [2] and transfer vectors [3].
 
 This is done via a `config.json` file, PyExaFMM will look for this in your **current working directory**, which allows you to configure experimental parameters, as well as choose a kernel, and computational backend, which optimise the operator methods of the FMM using different approaches.
 
 ```json
 {
     "experiment": "test",
-    "npoints": 10000,
+    "npoints": 1000000,
     "data_type": "random",
     "order_equivalent": 5,
     "order_check": 5,
@@ -66,23 +58,25 @@ This is done via a `config.json` file, PyExaFMM will look for this in your **cur
     "alpha_outer": 2.9,
     "max_level": 10,
     "max_points": 150,
-    "target_rank": 10,
+    "target_rank": 20,
+    "precision": "single"
 }
 ```
 
 |Parameter      | Description                                        |
 |--------------	|-----------------------------------------------	 |
-| `experiment`	| Experiment name, used to label HDF5 database          |
-| `npoints`     | Number of points to generate in test data.         |
+| `experiment`	| Experiment name, used to label HDF5 database       |
+| `npoints`     | Number of points to generate in test data using demo functions.         |
 | `data_type`   | Type of test data to generate.                     |
-| `order_equivalent`| Order of multipole expansions, same as discretisation of equivalent surface.  |
-| `order_check`     | Order of local expansions, same as discretisation of check surface.           |
+| `order_equivalent`| Order of multipole expansions, same as discretization of equivalent surface.  |
+| `order_check`     | Order of local expansions, same as discretization of check surface.           |
 | `kernel`      | Kernel function to use, currently only supports laplace.          |
-| `backend`      | Compute backend to use, currently only supports numba.           |
+| `backend`      | Compute backend to use, currently only supports Numba.           |
 | `alpha_inner`	| Relative size of inner surface's radius.           |
 | `alpha_outer`	| Relative size of outer surface's radius.           |
 | `max_level`   | Depth of octree to use in simulations.             |
 | `target_rank` | Target rank in low-rank compression of M2L matrix. |
+| `precision`   | Experimental precision, 'single' or 'double'.             |
 
 PyExaFMM provides some simple test-data generation functions, which can be configured for. However, to use your own data, simply create a HDF5 file, with the same name as `experiment` in your configuration file, with the following group hierarchy,
 
@@ -93,40 +87,34 @@ particle_data/
     |_ targets/
 ```
 
-where `sources` and `targets` are the coordinates of your source and target particles respectivley, of shape `(nsources/ntargets, 3)`, and source densities is of shape `(nsources, 1)`.
+where `sources` and `targets` are the coordinates of your source and target particles resp., of shape `(nsources/ntargets, 3)`, and source densities is of shape `(nsources, 1)`.
 
 The CLI workflow is as follows,
 
 ```bash
-
 # Generate test data (optional)
-fmm generate-test-data
+fmm generate-test-data -c config.json
 
-# Run operator pre-computations
-fmm compute-operators
+# Run operator and tree pre-computations
+fmm compute-operators -c config.json
 ```
 
-Once this is done, you'll be left with a `.hdf5` database of precomputed parametrisations, with the same name as your specified `experiment` parameter from your `config.json`. If you've used your own data, then the operator precomputations will be written into the same HDF5 file.
+Once this is done, you'll be left with a `.hdf5` database of precomputed parametrization, with the same name as your specified `experiment` parameter from your `config.json`. If you've used your own data, then the operator pre-computations will be written into the same HDF5 file.
 
-Finally, ensure that the following MKL environment variables are set,
+### Configure Threading Layer
 
-```bash
-export MKL_THREADING_LAYER=tbb
-export MKL_NUM_THREADS=1
-```
+We optimize PyExaFMM for an OpenMP backend as the Numba threading layer. To avoid oversubscription due to nested parallelism, created when calling Numpy/Scipy functions from within OpenMP threads, we restrict internal Numpy thread pools to contain at most a single thread.
 
-This is to ensure that there is no over-subscription of threads from calling BLAS routines on top of Numba. See [this discussion](https://github.com/numba/numba/issues/6637#issuecomment-760460620) for more information. TBB should be installed alongside Numba, however if this isn't the case, it can be installed separately,
+Optimum parameters (for Intel CPUs) are provided in the `.env` file, and set with,
 
 ```bash
-conda install -c conda-forge tbb
+# Set threading environment variables
+source .env
 ```
 
- Now you are ready to start programming with PyExaFMM.
-
-
 ## Usage
 
-Example usage of the API presented by the `Fmm` class is as follows:
+The `Fmm` class acts as the API for PyExaFMM. Example usage patterns are provided below:
 
 ```python
 from fmm import Fmm
@@ -140,14 +128,53 @@ e = Fmm()
 # Run FMM algorithm
 e.run()
 
-# Access target potentials at each node via an index pointer
-# Find index of a given node in the complete octree
+# Access targets, sources, and computed potentials/potential gradients at
+# a given leaf
+leaf = e.leaves[42]
+
+## Dictionary mapping leaf key to a pointer defining alignment
+leaf_idx = e.key_to_leaf_index[leaf]
+
+## Targets at leaf
+leaf_targets = e.targets[
+    e.target_index_pointer[leaf_idx]:e.target_index_pointer[leaf_idx+1]
+]
+
+## Sources at leaf
+leaf_sources = e.sources[
+    e.source_index_pointer[leaf_idx]:e.source_index_pointer[leaf_idx+1]
+]
+
+## Source densities at leaf
+leaf_source_densities = e.sources[
+    e.source_index_pointer[leaf_idx]:e.source_index_pointer[leaf_idx+1]
+]
+
+## 4-vectors of target potentials/potential gradients aligned with 'leaf_targets'
+leaf_potentials = e.target_potentials[
+    e.target_index_pointer[leaf_idx]:e.target_index_pointer[leaf_idx+1]
+]
+
+# Access multipole/local expansions at a given tree node
 key = e.complete[42]
-idx = e.key_to_index[key]
-e.target_potentials[e.target_index_pointer[idx]:e.target_index_pointer[idx+1]]
 
-# These correspond to the the similarly indexed targets
-e.targets[e.target_index_pointer[idx]:e.target_index_pointer[idx+1]]
+## Dictionary mapping a given key to a pointer defining alignment
+key_idx = e.key_to_index[key]
+
+## Multipole expansions defined by equivalent surface
+multipole_index = key_index*e.nequivalent_points
+multipole_expansion = e.multipole_expansions[
+    multipole_index:multipole_index+e.nequivalent_points
+]
+
+## Local expansions defined by check surface
+local_index = key_index*e.ncheck_points
+local_expansion = e.local_expansions[
+    local_index:local_index+e.ncheck_points
+]
+
+# Clear potentials, gradients, and expansions to re-run experiment
+e.clear()
 ```
 
 
@@ -162,11 +189,13 @@ fmm [OPTIONS] COMMAND [ARGS]
 | `compute-operators`  | Run operator pre-computations           | `-c <config_filename>`  |
 | `generate-test-data` | Generate `npoints` sources & targets    | `-c <config_filename>`  |
 
-The option to specify a custom config filename `-c` overrides the PyExaFMM default to search for a file named `config.json` in your current working directory to parameterise precomputations.
+The option to specify a custom config filename `-c` overrides the PyExaFMM default to search for a file named `config.json` in your current working directory to parameterize pre-computations.
 
 
 ## References
 
 [1] Ying, L., Biros, G., & Zorin, D. (2004). A kernel-independent adaptive fast multipole algorithm in two and three dimensions. Journal of Computational Physics, 196(2), 591-626.
 
 [2] Halko, N., Martinsson, P. G., & Tropp, J. A. (2011). Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review, 53(2), 217-288.
+
+[3] Fong, W., & Darve, E. (2009). The black-box fast multipole method. Journal of Computational Physics, 228(23), 8712-8725.