Skip to content

Commit

Permalink
Fixing up naming and versions for a few more examples
Browse files Browse the repository at this point in the history
  • Loading branch information
hunhoffe committed Dec 10, 2024
1 parent bbcbbbd commit cb1d1fc
Show file tree
Hide file tree
Showing 39 changed files with 111 additions and 650 deletions.
8 changes: 4 additions & 4 deletions programming_examples/basic/dma_transpose/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,24 +33,24 @@ The implicit copy is performed using the `ObjectFifo.forward()` function that sp
The `object_fifo_link` operation used explicitly by`dma_transpose.py` and `dma_transpose._alt.py` is described in more depth in [Section-2b](../../../programming_guide/section-2/section-2b/README.md/#object-fifo-link) of the programming guide.

To compile and run the design `dma_transpose_iron.py` for NPU:
```bash
```shell
make env use_iron=1
make run
```

To compile and run the design `dma_transpose.py` for NPU:
```bash
```shell
make
make run
```

To compile and run the design `dma_transpose_alt.py` for NPU:
```bash
```shell
make env use_alt=1
make run
```

To generate a data visualization of the transpose (like that above), run:
```bash
```shell
make generate_access_map
```
Original file line number Diff line number Diff line change
Expand Up @@ -20,5 +20,5 @@ The current design only works for scalar `int16`.

The performance sweep results against `whole_array` can be found at [here](https://gist.github.com/Yu-Zhewen/da3fed9feb278b973f35fb78c2d3a484), no gain observed.

The orignal implementation of the design is found at [matmul.py](./matmul.py). An alternative version of the design, featuring different runtime operations,
is found at [matmul_alt.py](./matmul_alt.py).
The orignal implementation of the design is found at [cascade.py](./cascade.py). An alternative version of the design, featuring different runtime operations,
is found at [cascade_alt.py](./cascade_alt.py).
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ subdir=matrix_vector
targetname=matrix_vector

# Currently does not accept reconfiguring size via these variables; must change
# in source at matmul.py as well as here
# in source at <targetname>.py as well as here
M=288
K=288
N=1
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@ In this design, one or multiple AI Engine compute cores (spread across hardware

> This design relies on the same basic concepts as the [whole-array matrix-matrix multiplication design](../whole_array/README.md), and it is structured very similarly to that design. Please refer to the in-depth explanation of that design along with the below outlined differences for a better understanding of this design.
The orignal implementation of the design is found at [matmul.py](./matmul.py). An alternative version of the design, featuring different runtime operations,
is found at [matmul_alt.py](./matmul_alt.py). A version written in a higher-level form of IRON is found at [matmul_iron.py](./matmul_iron.py).
The orignal implementation of the design is found at [matrix_vector.py](./matrix_vector.py). An alternative version of the design, featuring different runtime operations,
is found at [matrix_vector_alt.py](./matrix_vector_alt.py). A version written in a higher-level form of IRON is found at [matrix_vector_iron.py](./matrix_vector_iron.py).

## Differences from the [Whole-Array Matrix-Matrix Multiplication Design](../whole_array/README.md)

Expand All @@ -28,22 +28,22 @@ is found at [matmul_alt.py](./matmul_alt.py). A version written in a higher-leve
You need C++23 for `bfloat16_t` support. It can be found in g++-13: https://lindevs.com/install-g-on-ubuntu

To compile and run the original design:
```
```shell
make env use_alt=1
make env use_alt=1 matrixVectorMultiplication.exe
make env use_alt=1 matrix_vector.exe
make env use_alt=1 run
```

To compile and run the alternative design:
```
```shell
make env use_alt=1
make env use_alt=1 matrixVectorMultiplication.exe
make env use_alt=1 matrix_vector.exe
make env use_alt=1 run
```

To compile and run the higher-level IRON design:
```
```shell
make env use_iron=1
make env use_iron=1 matrixVectorMultiplication.exe
make env use_iron=1 matrix_vector.exe
make env use_iron=1 run
```
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ target_suffix=${M}x${K}x${N}_${m}x${k}x${n}
use_alt?=0

ifeq (${use_alt}, 1)
aie_py_src=matmul_alt.py
aie_py_src=${targetname}_alt.py
endif

include ${srcdir}/../makefile-common
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,35 +19,35 @@ In this design, a single AI Engine compute core performs a matrix-matrix-multipl
* This design supports tracing; See [below](#tracing).
* Only a single core performs computations. As such, we only need a single ObjectFIFO for each of the transfers between the levels (shim &rightarrow; memory, memory &rightarrow; compute, and back). These ObjectFIFOs are named `inA`, `inB`, `outC` and `memA`, `memB` and `memC`, respectively.

## Notes on the `matmul_alt.py` Implementation
## Notes on the `single_core_alt.py` Implementation

As in the whole-array design, the [`matmul.py`](./matmul.py) file describes the data movement of the design. This single core example also comes with an alternative implementation, which can be found in [`matmul_alt.py`](./matmul_alt.py). If you specify `use_alt=1` as an environment variable at compile time, this alternative implementation will be used in place of `matmul.py`.
As in the whole-array design, the [`single_core.py`](./single_core.py) file describes the data movement of the design. This single core example also comes with an alternative implementation, which can be found in [`single_core_alt.py`](./single_core_alt.py). If you specify `use_alt=1` as an environment variable at compile time, this alternative implementation will be used in place of `single_core.py`.

Functionally, `matmul.py` and `matmul_alt.py` are intended to be identical. However, `matmul_alt.py` is implemented using a new syntax for runtime buffer descriptor configuration on the shim. Specifically, `matmul_alt.py` uses the `aiex.dma_configure_task_for`, `aiex.dma_start_task` and `aiex.dma_await_task` operations instead of `aiex.dma_memcpy_nd`.
Functionally, `single_core.py` and `single_core_alt.py` are intended to be identical. However, `single_core_alt.py` is implemented using a new syntax for runtime buffer descriptor configuration on the shim. Specifically, `single_core_alt.py` uses the `aiex.dma_configure_task_for`, `aiex.dma_start_task` and `aiex.dma_await_task` operations instead of `aiex.dma_memcpy_nd`.

## Notes on the `matmul_iron.py` Implementation
## Notes on the `single_core_iron.py` Implementation

There is an implementation of this design found in [`matmul_iron.py`](./matmul_iron.py) using a higher-level version of IRON. If you specify `use_iron=1` as an environment variable at compile time, this alternative implementation will be used in place of `matmul.py`.
There is an implementation of this design found in [`single_core_iron.py`](./single_core_iron.py) using a higher-level version of IRON. If you specify `use_iron=1` as an environment variable at compile time, this alternative implementation will be used in place of `single_core.py`.

Functionally, this design is intended to be identical to the other two. However, `matmul_iron.py` currently does not support tracing.
Functionally, this design is intended to be identical to the other two. However, `single_core_iron.py` currently does not support tracing.

## Building and Running the Design

You need C++23 for bfloat16_t support. It can be found in g++-13: https://lindevs.com/install-g-on-ubuntu

To compile design:
```
```shell
make
make matrixMultiplication.exe
make single_core.exe
```

To run the design:
```
```shell
make run
```

## Tracing

To get tracing output, set `enable_tracing=True` in `matmul.py` and `ENABLE_TRACING=true` in `test.cpp`.
To get tracing output, set `enable_tracing=True` in `single_core.py` and `ENABLE_TRACING=true` in `test.cpp`. Tracing is also supported in `single_core_alt.py`.

By default, traces will be written out to `trace.txt`; another output file can be specified using the `--trace` (or `-t`) flag to the host code.
14 changes: 3 additions & 11 deletions programming_examples/basic/passthrough_kernel/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -13,25 +13,17 @@ srcdir := $(shell dirname $(realpath $(firstword $(MAKEFILE_LIST))))
include ${srcdir}/../../makefile-common

device = npu
targetname = passThroughKernel
targetname = passthrough_kernel
VPATH := ${srcdir}/../../../aie_kernels/generic
data_size = 4096
trace_size = 8192
PASSTHROUGH_SIZE = ${data_size}

aie_py_src=aie2.py
aie_py_src=${targetname}.py
use_alt?=0
use_iron?=0

ifeq (${use_alt}, 1)
aie_py_src=aie2_alt.py
ifeq (${use_iron}, 1)
$(error Cannot specify both alternative design and IRON)
endif
endif

ifeq (${use_iron}, 1)
aie_py_src=aie2_iron.py
aie_py_src=${targetname}_alt.py
endif

.PHONY: all template clean
Expand Down
33 changes: 18 additions & 15 deletions programming_examples/basic/passthrough_kernel/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,9 @@ This IRON design flow example, called "Passthrough Kernel", demonstrates a simpl

## Source Files Overview

1. `aie2.py`: A Python script that defines the AIE array structural design using MLIR-AIE operations. The file generates MLIR that is then compiled using `aiecc.py` to produce design binaries (ie. XCLBIN and inst.txt for the NPU in Ryzen™ AI).
1. `passthrough_kernel.py`: A Python script that defines the AIE array structural design using MLIR-AIE operations. The file generates MLIR that is then compiled using `aiecc.py` to produce design binaries (ie. XCLBIN and inst.txt for the NPU in Ryzen™ AI).

1. `passthrough_kernel_alt.py`: A Python script that defines the AIE array structural design using an alternative IRON syntax that yields MLIR-AIE operations. The file generates MLIR that is then compiled using `aiecc.py` to produce design binaries (ie. XCLBIN and inst.txt for the NPU in Ryzen™ AI).

1. `passThrough.cc`: A C++ implementation of vectorized memcpy operations for AIE cores. Found [here](../../../aie_kernels/generic/passThrough.cc).

Expand All @@ -28,15 +30,15 @@ This IRON design flow example, called "Passthrough Kernel", demonstrates a simpl

This simple example effectively passes data through a single compute tile in the NPU's AIE array. The design is described as shown in the figure to the right. The overall design flow is as follows:
1. An object FIFO called "of_in" connects a Shim Tile to a Compute Tile, and another called "of_out" connects the Compute Tile back to the Shim Tile.
1. The runtime data movement is expressed to read `4096` uint8_t data from host memory to the compute tile and write the `4096` data back to host memory.
1. The runtime data movement is expressed to read `4096` `uint8_t` data from host memory to the compute tile and write the `4096` data back to host memory.
1. The compute tile acquires this input data in "object" sized (`1024`) blocks from "of_in" and copies them to another output "object" it has acquired from "of_out". Note that a vectorized kernel running on the Compute Tile's AIE core copies the data from the input "object" to the output "object".
1. After the vectorized copy is performed, the Compute Tile releases the "objects", allowing the DMAs (abstracted by the object FIFO) to transfer the data back to host memory and copy additional blocks into the Compute Tile, "of_out" and "of_in" respectively.

It is important to note that the Shim Tile and Compute Tile DMAs move data concurrently, and the Compute Tile's AIE Core also processes data concurrently with the data movement. This is made possible by expressing depth `2` in declaring, for example, `object_fifo("in", ShimTile, ComputeTile2, 2, line_ty)` to denote ping-pong buffers.
It is important to note that the Shim Tile and Compute Tile DMAs move data concurrently, and the Compute Tile's AIE Core also processes data concurrently with the data movement. This is made possible by expressing depth `2` in declaring the ObjectFifo, for example, `ObjectFifo(line_ty, name="in", default_depth=2)` to denote ping-pong buffers. If `default_depth` is not declared, the default is `2` in reference to this pattern.

## Design Component Details

### AIE Array Structural Design
### AIE Array Structural Alternative Design

This design performs a memcpy operation on a vector of input data. The AIE design is described in a Python module as follows:

Expand Down Expand Up @@ -66,34 +68,35 @@ This design performs a memcpy operation on a vector of input data. The AIE desig

1. **Vectorized Copying:** The `passThrough_aie()` function processes multiple data elements simultaneously, taking advantage of AIE vector datapath capabilities to load, copy and store data elements.

1. **C-style Wrapper Functions:** `passThroughLine()` and `passThroughTile()` are two C-style wrapper functions to call the templated `passThrough_aie()` vectorized memcpy implementation from the AIE design implemented in `aie2.py`. The `passThroughLine()` and `passThroughTile()` functions are compiled for `uint8_t`, `int16_t`, or `int32_t` determined by the value the `BIT_WIDTH` variable defines.
1. **C-style Wrapper Functions:** `passThroughLine()` and `passThroughTile()` are two C-style wrapper functions to call the templated `passThrough_aie()` vectorized memcpy implementation from the AIE design implemented in `passthrough_kernel.py`. The `passThroughLine()` and `passThroughTile()` functions are compiled for `uint8_t`, `int16_t`, or `int32_t` determined by the value the `BIT_WIDTH` variable defines.

## Usage

### C++ Testbench
### Compilation

To compile the design:

```
```shell
make
```

To compile the alternative design:
```shell
env use_alt=1 make
```

### C++ Testbench

To complete compiling the C++ testbench and run the design:

```
```shell
make run
```

### Python Testbench

To compile the design:

```
make
```

To run the design:

```
```shell
make run_py
```
102 changes: 0 additions & 102 deletions programming_examples/basic/passthrough_kernel/aie2.py

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# passthrough_kernel/aie2_iron.py -*- Python -*-
# passthrough_kernel/passthrough_kernel.py -*- Python -*-
#
# This file is licensed under the Apache License v2.0 with LLVM Exceptions.
# See https://llvm.org/LICENSE.txt for license information.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# passthrough_kernel/aie2_alt.py -*- Python -*-
# passthrough_kernel/passthrough_kernel_alt.py -*- Python -*-
#
# This file is licensed under the Apache License v2.0 with LLVM Exceptions.
# See https://llvm.org/LICENSE.txt for license information.
Expand Down

This file was deleted.

Loading

0 comments on commit cb1d1fc

Please sign in to comment.