HBP SGA1 KPIs

Arbor Feature Readiness

KPIs

We propose two KPIs.

The KPIs are based on measuring the number of features implemented in Arbor, and the quality of their implementation across key HPC architectures.

KPI	M12-M18	M24 (target)
Feature progress	26	41 (40)
Feature portability	80%	86% (80%)

Arbor Feature Progress

At the end of M18: 26

Target for M24: 40

This value is the sum of the Status for each feature. As new features are (a) added, (b) ported to new systems (c) improved/optimized, the KPI increases. Over time the rate of increase can be used to measure progress in the project.

Arbor Feature Portability

At the end of M18: 80%

Target for M24: 80%

This measures the proportion of features that have been implemented across all systems. It captures how "performance portable" the Arbor library is across all features. It is calculated as the percentage of features that have high-quality implementations on all systems.

Feature description

Each feature has a status that indicates its readiness on each target system:

Value	Meaning
0	Not implemented.
1	Functional implementation.
2	Optimized implementation.
-	Not applicable.

A feature is marked not applicable on a system if it does not require a hardware-specific implementation. For example, the online documentation does not require a hardware-specific implementation.

We assign an overall "Status" to each feature, describing the quality of the implementation of the feature as described in the following table. Features with no hardware-specific component will have a status of 1 (functional, but requires performance optimization) or 3 (functional and optimized).

Status	Meaning
1	Functional implementation on at least one platform.
2	Functional implementation on all platforms.
3	Optimized (if relevant) implementation on all platforms.

The KPIs are presented for the following platforms, which are all systems based on the same Cray XC40 cabinet and networks, and installed at CSCS:

Platform	System and architecture
mc	Daint-mc: Cray XC40 with dual socket 18-core Intel Broadwell CPUs per node.
gpu	Daint-gpu: Cray XC40 with a 12-core Intel Haswell CPU and 1 NVIDIA P100 GPU per node.
knl	Tave: Cray XC40 with a 64-core KNL socket per node.

KPI Table

Feature	mc	gpu	knl	Status	Description	PR	Date
1. Energy metering	2	2	2	3	Record and report energy consumption of simulations.	#222	M13
2. Memory metering	2	2	2	3	Record and report memory consumption of simulations. Records both CPU and GPU memory consumption separately.	#220	M13
3. Generic cell groups	2	2	2	3	Use type erasure for interface between a model and its cell groups, so that users and developers can easily extend Arbor with new cell types.	#259	M14
4. Event delivery	2	2	2	3	Batch event delivery processing and modify integration step sizes per-cell to accommodate discontinuous state changes.	#261 #297	M15
5. Online documentation	-	-	-	3	Online ReadTheDocs documentation generated automatically on updates to the repository.	#328	M16
6. Target-specific kernels	2	2	2	3	Generate kernels/functions for user-defined mechanisms that are specifically optimized for the target system. This means CUDA kernels on GPU, and AVX2/AVX512 instrinsics on daint-mc and tave respectively.	#282	M16
7. Load balancer	2	2	2	3	Implements a simple interface for implementing a load balancer, which produces a domain decomposition. Also extend cell groups to encapsulate an arbitrary set of cells, described by a domain decomposition.	#244 #334	M17
8. Separable compilation	1	2	1	1	Build back end optimized kernels with a different compiler than used for the rest of the library/front end. This reduces the number of compiler bugs and restrictions in the front end code. Currently only for CUDA on GPU.	#356	M18
9. Continuous integration	2	0	0	1	Automated compilation and testing of pull requests to check for bugs/problems. Currently on Travis CI, which does not support CUDA or KNL.	#340	M18
10. Sampling	2	2	2	3	Flexible method for sampling values (e.g. voltage at a specific location) at a user defined set of time points	#353	M18
11. Optimized event delivery	2	2	2	3	Parallel method for delivering post synaptic spike events to cell targest.	#369	M20
12. Parameter setting	2	2	2	3	Specification of mechanism and model parameters for cell in model descriptions	#377	M20
13. Event Generators	2	2	2	3	Generic event generators that can be attached directly to cells	#414	M21
14. Generic Hardware Backends	2	2	2	-	Plugin interface for unintrusively adding new hardware backends (significant refactoring).	#???	M24
15. Runtime mechanism interface	2	2	2	-	Mechanisms can be added dynamically at runtime, instead of compile time (significant refactoring).	#???	M24

This provides a snapshot of the status of the project at the end of M24 SGA1 (i.e. the end of SGA1), based on features that were added/extended in M12-M24. This method and page will be used to monitor progress month by month throughout SGA2.

Notes

M18-M24 SGA1

We estimated 15 progress KPI points for this 6 month period, which was less than the 26 achieved in the prior 6 months, because much of this 6 month period would involve refactoring work that wouldn't add so many features. This was quite a good estimate in hindsight, with 16 KPI feature points added over the 6 months.

Much of the development effort was focussed on refactoring and improving existing features. For example, the SIMD code generation, which is part of feature 7, was completely rewritten and improved. Such changes often have no impact on the KPIs as defined, despite requiring significant resources and improving Arbor, because they work on features that have already been implemented. Features 14 and 15 were a massive undertaking, requiring 4 months of full time work and significant refactoring, and the 6 KPI for these two features don't really reflect the value of the new interfaces that they implement. Two months of development time was invested in the Python interface, which was at prototype phase at the end of SGA1, but not ready for inclusion in Arbor mainline, so it couldn't be counted towards our progess.

The support for feature 8 (seperable compilation) was moved from 0 to 1 for mc and knl, because the new SIMD back end removes the need for using target-specific compilers, by making vectorized versions of all math kernels available to gcc and clang. It isn't exactly seperable compilation, but is serves the same ends: removing the need for the Intel compiler to fully optimized kernels.

All of the features were implemented on the host side, i.e. they did not require directly writing any GPU code, however the features all were useable and optimized to work with the GPU back end, so we were able to improve our performance portability KPI to 86%.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly