Skip to content

Commit

Permalink
added technotes and MVM compute note
Browse files Browse the repository at this point in the history
  • Loading branch information
oguyon committed May 17, 2024
1 parent 2c8aa26 commit 91d6f67
Show file tree
Hide file tree
Showing 5 changed files with 99 additions and 0 deletions.
34 changes: 34 additions & 0 deletions _data/sidebars/technotes_sidebar.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
entries:
- title: Tech Notes
product: cacao - Tech Notes
folders:


- title: Overview
output: web, pdf
folderitems:

- title: About
url: /cacao_technotes_about.html
output: web




- title: Compute Hardware
output: web, pdf
folderitems:

- title: Requirements
url: /cacao_comphardw_requirements.html
output: web


- title: Real-time OS
output: web, pdf
folderitems:

- title: Linux Kernel
url: /cacao_RTlinux.html
output: web

4 changes: 4 additions & 0 deletions _data/topnav.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,10 @@ topnav_dropdowns:
folderitems:
- title: cacao by example
url: /cacao_examples.html
- title: Tech Notes
folderitems:
- title: Tech Notes
url: /cacao_technotes_about.html
- title: Contributing
folderitems:
- title: Writing Documentation
Expand Down
Binary file not shown.
42 changes: 42 additions & 0 deletions pages/cacao/cacao_comphardw_requirements.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
---
title: Compute Hardware Requirements
keywords:
last_updated: May 17, 2024
tags: [CPU, GPU]
summary: "Hardware Requirements: compute bandwidth needed to close the AO loop."
sidebar: technotes_sidebar
permalink: cacao_comphardw_requirements.html
folder: cacao
---


## 1. Matrix-Vector Multiply (MVM) and GPU/CPU Specs

The most compute-heavy operation in closing the AO loop is often the matrix-vector-multiply (MVM) converting the input WFS pixel values to output wavefront modes. This MVM must be completed in a fraction of the AO loop period, typically well under 1 ms.


The MVM is most often memory-bandwidth limited, so when choosing the compute hardware (for example GPU), the device's memory bandwidth is the most relevant parameter. This is described in [this MVM technical note](https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html), where N=1 (Matrix-vector multiply, special case of matrix-matrix multiply), K is the number of WFS elements, and M is the number of modes reconstructed, or in zonal control, the number of DM actuators.

Taking, for example, a large system with 87k input pixels, 33k output modes :

```
M=33k
K=87k
N=1
Assuming FP16 input, FP32 accumulation
For each MVM:
Compute load : 2.9 GFLOP
memory load : 2.9 GB
Arithmetic intensity ~ 1 (need one FLOP per byte)
```



As of today (year 2024), current GPU have memory bandwidth of approximately 2 TB/s (note this is terabytes, not terabits), and have compute bandwidth of about ~200 TFLOPS. Comparing these specs with the requirements derived above reveals that the MVM will be memory bandwidth limited, not compute bandwidth limited.

In this example, the MVM would take 1.45 ms (700 Hz maxmum AO frame rate).

{% include links.html %}
19 changes: 19 additions & 0 deletions pages/cacao/cacao_technotes_about.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
---
title: cacao
keywords:
last_updated: May 17, 2024
tags: [getting_started]
summary: "cacao Tech Notes"
sidebar: technotes_sidebar
permalink: cacao_technotes_about.html
folder: cacao
---


Tech notes relevant to Adaptive Optics and cacao.





{% include links.html %}

0 comments on commit 91d6f67

Please sign in to comment.