-
Notifications
You must be signed in to change notification settings - Fork 61
HBP SGA1 KPIs
We propose two KPIs.
The KPIs are based on measuring the number of features implemented in Arbor, and the quality of their implementation across key HPC architectures.
KPI | M12-M18 | M24 (target) |
---|---|---|
Feature progress | 26 | 41 (40) |
Feature portability | 80% | 86% (80%) |
At the end of M18: 26
Target for M24: 40
This value is the sum of the Status for each feature. As new features are (a) added, (b) ported to new systems (c) improved/optimized, the KPI increases. Over time the rate of increase can be used to measure progress in the project.
At the end of M18: 80%
Target for M24: 80%
This measures the proportion of features that have been implemented across all systems. It captures how "performance portable" the Arbor library is across all features. It is calculated as the percentage of features that have high-quality implementations on all systems.
Each feature has a status that indicates its readiness on each target system:
Value | Meaning |
---|---|
0 | Not implemented. |
1 | Functional implementation. |
2 | Optimized implementation. |
- | Not applicable. |
A feature is marked not applicable on a system if it does not require a hardware-specific implementation. For example, the online documentation does not require a hardware-specific implementation.
We assign an overall "Status" to each feature, describing the quality of the implementation of the feature as described in the following table. Features with no hardware-specific component will have a status of 1 (functional, but requires performance optimization) or 3 (functional and optimized).
Status | Meaning |
---|---|
1 | Functional implementation on at least one platform. |
2 | Functional implementation on all platforms. |
3 | Optimized (if relevant) implementation on all platforms. |
The KPIs are presented for the following platforms, which are all systems based on the same Cray XC40 cabinet and networks, and installed at CSCS:
Platform | System and architecture |
---|---|
mc | Daint-mc: Cray XC40 with dual socket 18-core Intel Broadwell CPUs per node. |
gpu | Daint-gpu: Cray XC40 with a 12-core Intel Haswell CPU and 1 NVIDIA P100 GPU per node. |
knl | Tave: Cray XC40 with a 64-core KNL socket per node. |
Feature | mc | gpu | knl | Status | Description | PR | Date |
---|---|---|---|---|---|---|---|
1. Energy metering | 2 | 2 | 2 | 3 | Record and report energy consumption of simulations. | #222 | M13 |
2. Memory metering | 2 | 2 | 2 | 3 | Record and report memory consumption of simulations. Records both CPU and GPU memory consumption separately. | #220 | M13 |
3. Generic cell groups | 2 | 2 | 2 | 3 | Use type erasure for interface between a model and its cell groups, so that users and developers can easily extend Arbor with new cell types. | #259 | M14 |
4. Event delivery | 2 | 2 | 2 | 3 | Batch event delivery processing and modify integration step sizes per-cell to accommodate discontinuous state changes. | #261 #297 | M15 |
5. Online documentation | - | - | - | 3 | Online ReadTheDocs documentation generated automatically on updates to the repository. | #328 | M16 |
6. Target-specific kernels | 2 | 2 | 2 | 3 | Generate kernels/functions for user-defined mechanisms that are specifically optimized for the target system. This means CUDA kernels on GPU, and AVX2/AVX512 instrinsics on daint-mc and tave respectively. | #282 | M16 |
7. Load balancer | 2 | 2 | 2 | 3 | Implements a simple interface for implementing a load balancer, which produces a domain decomposition. Also extend cell groups to encapsulate an arbitrary set of cells, described by a domain decomposition. | #244 #334 | M17 |
8. Separable compilation | 1 | 2 | 1 | 1 | Build back end optimized kernels with a different compiler than used for the rest of the library/front end. This reduces the number of compiler bugs and restrictions in the front end code. Currently only for CUDA on GPU. | #356 | M18 |
9. Continuous integration | 2 | 0 | 0 | 1 | Automated compilation and testing of pull requests to check for bugs/problems. Currently on Travis CI, which does not support CUDA or KNL. | #340 | M18 |
10. Sampling | 2 | 2 | 2 | 3 | Flexible method for sampling values (e.g. voltage at a specific location) at a user defined set of time points | #353 | M18 |
11. Optimized event delivery | 2 | 2 | 2 | 3 | Parallel method for delivering post synaptic spike events to cell targest. | #369 | M20 |
12. Parameter setting | 2 | 2 | 2 | 3 | Specification of mechanism and model parameters for cell in model descriptions | #377 | M20 |
13. Event Generators | 2 | 2 | 2 | 3 | Generic event generators that can be attached directly to cells | #414 | M21 |
14. Generic Hardware Backends | 2 | 2 | 2 | - | Plugin interface for unintrusively adding new hardware backends (significant refactoring). | #??? | M24 |
15. Runtime mechanism interface | 2 | 2 | 2 | - | Mechanisms can be added dynamically at runtime, instead of compile time (significant refactoring). | #??? | M24 |
This provides a snapshot of the status of the project at the end of M24 SGA1 (i.e. the end of SGA1), based on features that were added/extended in M12-M24. This method and page will be used to monitor progress month by month throughout SGA2.
We estimated 15 progress KPI points for this 6 month period, which was less than the 26 achieved in the prior 6 months, because much of this 6 month period would involve refactoring work that wouldn't add so many features. This was quite a good estimate in hindsight, with 16 KPI feature points added over the 6 months.
Much of the development effort was focussed on refactoring and improving existing features. For example, the SIMD code generation, which is part of feature 7, was completely rewritten and improved. Such changes often have no impact on the KPIs as defined, despite requiring significant resources and improving Arbor, because they work on features that have already been implemented. Features 14 and 15 were a massive undertaking, requiring 4 months of full time work and significant refactoring, and the 6 KPI for these two features don't really reflect the value of the new interfaces that they implement. Two months of development time was invested in the Python interface, which was at prototype phase at the end of SGA1, but not ready for inclusion in Arbor mainline, so it couldn't be counted towards our progess.
The support for feature 8 (seperable compilation) was moved from 0 to 1 for mc and knl, because the new SIMD back end removes the need for using target-specific compilers, by making vectorized versions of all math kernels available to gcc and clang. It isn't exactly seperable compilation, but is serves the same ends: removing the need for the Intel compiler to fully optimized kernels.
All of the features were implemented on the host side, i.e. they did not require directly writing any GPU code, however the features all were useable and optimized to work with the GPU back end, so we were able to improve our performance portability KPI to 86%.