Fixing up naming and versions for a few more examples

Xilinx · Dec 10, 2024 · cb1d1fc · cb1d1fc
1 parent bbcbbbd
commit cb1d1fc
Show file tree

Hide file tree

Showing 39 changed files with 111 additions and 650 deletions.
diff --git a/programming_examples/basic/dma_transpose/README.md b/programming_examples/basic/dma_transpose/README.md
@@ -33,24 +33,24 @@ The implicit copy is performed using the `ObjectFifo.forward()` function that sp
 The `object_fifo_link` operation used explicitly by`dma_transpose.py` and `dma_transpose._alt.py` is described in more depth in [Section-2b](../../../programming_guide/section-2/section-2b/README.md/#object-fifo-link) of the programming guide.
 
 To compile and run the design `dma_transpose_iron.py` for NPU:
-```bash
+```shell
 make env use_iron=1
 make run
 ```
 
 To compile and run the design `dma_transpose.py` for NPU:
-```bash
+```shell
 make
 make run
 ```
 
 To compile and run the design `dma_transpose_alt.py` for NPU:
-```bash
+```shell
 make env use_alt=1
 make run
 ```
 
 To generate a data visualization of the transpose (like that above), run:
-```bash
+```shell
 make generate_access_map
 ```
diff --git a/programming_examples/basic/matrix_multiplication/cascade/README.md b/programming_examples/basic/matrix_multiplication/cascade/README.md
@@ -20,5 +20,5 @@ The current design only works for scalar `int16`.
 
 The performance sweep results against `whole_array` can be found at [here](https://gist.github.com/Yu-Zhewen/da3fed9feb278b973f35fb78c2d3a484), no gain observed. 
 
-The orignal implementation of the design is found at [matmul.py](./matmul.py). An alternative version of the design, featuring different runtime operations,
-is found at [matmul_alt.py](./matmul_alt.py).
+The orignal implementation of the design is found at [cascade.py](./cascade.py). An alternative version of the design, featuring different runtime operations,
+is found at [cascade_alt.py](./cascade_alt.py).
diff --git a/programming_examples/basic/matrix_multiplication/matrix_vector/Makefile b/programming_examples/basic/matrix_multiplication/matrix_vector/Makefile
@@ -12,7 +12,7 @@ subdir=matrix_vector
 targetname=matrix_vector
 
 # Currently does not accept reconfiguring size via these variables; must change
-# in source at matmul.py as well as here
+# in source at <targetname>.py as well as here
 M=288
 K=288
 N=1

diff --git a/programming_examples/basic/matrix_multiplication/matrix_vector/README.md b/programming_examples/basic/matrix_multiplication/matrix_vector/README.md
@@ -14,8 +14,8 @@ In this design, one or multiple AI Engine compute cores (spread across hardware
 
 > This design relies on the same basic concepts as the [whole-array matrix-matrix multiplication design](../whole_array/README.md), and it is structured very similarly to that design. Please refer to the in-depth explanation of that design along with the below outlined differences for a better understanding of this design.
 
-The orignal implementation of the design is found at [matmul.py](./matmul.py). An alternative version of the design, featuring different runtime operations,
-is found at [matmul_alt.py](./matmul_alt.py). A version written in a higher-level form of IRON is found at [matmul_iron.py](./matmul_iron.py).
+The orignal implementation of the design is found at [matrix_vector.py](./matrix_vector.py). An alternative version of the design, featuring different runtime operations,
+is found at [matrix_vector_alt.py](./matrix_vector_alt.py). A version written in a higher-level form of IRON is found at [matrix_vector_iron.py](./matrix_vector_iron.py).
 
 ## Differences from the [Whole-Array Matrix-Matrix Multiplication Design](../whole_array/README.md)
 
@@ -28,22 +28,22 @@ is found at [matmul_alt.py](./matmul_alt.py). A version written in a higher-leve
 You need C++23 for `bfloat16_t` support. It can be found in g++-13: https://lindevs.com/install-g-on-ubuntu
 
 To compile and run the original design:
-```
+```shell
 make env use_alt=1
-make env use_alt=1 matrixVectorMultiplication.exe
+make env use_alt=1 matrix_vector.exe
 make env use_alt=1 run
 ```
 
 To compile and run the alternative design:
-```
+```shell
 make env use_alt=1
-make env use_alt=1 matrixVectorMultiplication.exe
+make env use_alt=1 matrix_vector.exe
 make env use_alt=1 run
 ```
 
 To compile and run the higher-level IRON design:
-```
+```shell
 make env use_iron=1
-make env use_iron=1 matrixVectorMultiplication.exe
+make env use_iron=1 matrix_vector.exe
 make env use_iron=1 run
 ```
diff --git a/programming_examples/basic/matrix_multiplication/single_core/Makefile.chess b/programming_examples/basic/matrix_multiplication/single_core/Makefile.chess
@@ -26,7 +26,7 @@ target_suffix=${M}x${K}x${N}_${m}x${k}x${n}
 use_alt?=0
 
 ifeq (${use_alt}, 1)
-aie_py_src=matmul_alt.py
+aie_py_src=${targetname}_alt.py
 endif
 
 include ${srcdir}/../makefile-common

diff --git a/programming_examples/basic/matrix_multiplication/single_core/README.md b/programming_examples/basic/matrix_multiplication/single_core/README.md
@@ -19,35 +19,35 @@ In this design, a single AI Engine compute core performs a matrix-matrix-multipl
 * This design supports tracing; See [below](#tracing).
 * Only a single core performs computations. As such, we only need a single ObjectFIFO for each of the transfers between the levels (shim &rightarrow; memory, memory &rightarrow; compute, and back). These ObjectFIFOs are named `inA`, `inB`, `outC` and `memA`, `memB` and `memC`, respectively. 
 
-## Notes on the `matmul_alt.py` Implementation
+## Notes on the `single_core_alt.py` Implementation
 
-As in the whole-array design, the [`matmul.py`](./matmul.py) file describes the data movement of the design. This single core example also comes with an alternative implementation, which can be found in [`matmul_alt.py`](./matmul_alt.py). If you specify `use_alt=1` as an environment variable at compile time, this alternative implementation will be used in place of `matmul.py`.
+As in the whole-array design, the [`single_core.py`](./single_core.py) file describes the data movement of the design. This single core example also comes with an alternative implementation, which can be found in [`single_core_alt.py`](./single_core_alt.py). If you specify `use_alt=1` as an environment variable at compile time, this alternative implementation will be used in place of `single_core.py`.
 
-Functionally, `matmul.py` and `matmul_alt.py` are intended to be identical. However, `matmul_alt.py` is implemented using a new syntax for runtime buffer descriptor configuration on the shim. Specifically, `matmul_alt.py` uses the `aiex.dma_configure_task_for`, `aiex.dma_start_task` and `aiex.dma_await_task` operations instead of `aiex.dma_memcpy_nd`.
+Functionally, `single_core.py` and `single_core_alt.py` are intended to be identical. However, `single_core_alt.py` is implemented using a new syntax for runtime buffer descriptor configuration on the shim. Specifically, `single_core_alt.py` uses the `aiex.dma_configure_task_for`, `aiex.dma_start_task` and `aiex.dma_await_task` operations instead of `aiex.dma_memcpy_nd`.
 
-## Notes on the `matmul_iron.py` Implementation
+## Notes on the `single_core_iron.py` Implementation
 
-There is an implementation of this design found in [`matmul_iron.py`](./matmul_iron.py) using a higher-level version of IRON. If you specify `use_iron=1` as an environment variable at compile time, this alternative implementation will be used in place of `matmul.py`.
+There is an implementation of this design found in [`single_core_iron.py`](./single_core_iron.py) using a higher-level version of IRON. If you specify `use_iron=1` as an environment variable at compile time, this alternative implementation will be used in place of `single_core.py`.
 
-Functionally, this design is intended to be identical to the other two. However, `matmul_iron.py` currently does not support tracing.
+Functionally, this design is intended to be identical to the other two. However, `single_core_iron.py` currently does not support tracing.
 
 ## Building and Running the Design
 
 You need C++23 for bfloat16_t support. It can be found in g++-13: https://lindevs.com/install-g-on-ubuntu
 
 To compile design:
-```
+```shell
 make
-make matrixMultiplication.exe
+make single_core.exe
 ```
 
 To run the design:
-```
+```shell
 make run
 ```
 
 ## Tracing
 
-To get tracing output, set `enable_tracing=True` in `matmul.py` and `ENABLE_TRACING=true` in `test.cpp`.
+To get tracing output, set `enable_tracing=True` in `single_core.py` and `ENABLE_TRACING=true` in `test.cpp`. Tracing is also supported in `single_core_alt.py`.
 
 By default, traces will be written out to `trace.txt`; another output file can be specified using the `--trace` (or `-t`) flag to the host code.
diff --git a/programming_examples/basic/passthrough_kernel/Makefile b/programming_examples/basic/passthrough_kernel/Makefile
@@ -13,25 +13,17 @@ srcdir := $(shell dirname $(realpath $(firstword $(MAKEFILE_LIST))))
 include ${srcdir}/../../makefile-common
 
 device = npu
-targetname = passThroughKernel
+targetname = passthrough_kernel
 VPATH := ${srcdir}/../../../aie_kernels/generic
 data_size = 4096
 trace_size = 8192
 PASSTHROUGH_SIZE = ${data_size}
 
-aie_py_src=aie2.py
+aie_py_src=${targetname}.py
 use_alt?=0
-use_iron?=0
 
 ifeq (${use_alt}, 1)
-aie_py_src=aie2_alt.py
-ifeq (${use_iron}, 1)
-$(error Cannot specify both alternative design and IRON)
-endif
-endif
-
-ifeq (${use_iron}, 1)
-aie_py_src=aie2_iron.py
+aie_py_src=${targetname}_alt.py
 endif
 
 .PHONY: all template clean

diff --git a/programming_examples/basic/passthrough_kernel/README.md b/programming_examples/basic/passthrough_kernel/README.md
@@ -14,7 +14,9 @@ This IRON design flow example, called "Passthrough Kernel", demonstrates a simpl
 
 ## Source Files Overview
 
-1. `aie2.py`: A Python script that defines the AIE array structural design using MLIR-AIE operations. The file generates MLIR that is then compiled using `aiecc.py` to produce design binaries (ie. XCLBIN and inst.txt for the NPU in Ryzen™ AI). 
+1. `passthrough_kernel.py`: A Python script that defines the AIE array structural design using MLIR-AIE operations. The file generates MLIR that is then compiled using `aiecc.py` to produce design binaries (ie. XCLBIN and inst.txt for the NPU in Ryzen™ AI). 
+
+1. `passthrough_kernel_alt.py`: A Python script that defines the AIE array structural design using an alternative IRON syntax that yields MLIR-AIE operations. The file generates MLIR that is then compiled using `aiecc.py` to produce design binaries (ie. XCLBIN and inst.txt for the NPU in Ryzen™ AI). 
 
 1. `passThrough.cc`: A C++ implementation of vectorized memcpy operations for AIE cores. Found [here](../../../aie_kernels/generic/passThrough.cc).
 
@@ -28,15 +30,15 @@ This IRON design flow example, called "Passthrough Kernel", demonstrates a simpl
 
 This simple example effectively passes data through a single compute tile in the NPU's AIE array. The design is described as shown in the figure to the right. The overall design flow is as follows:
 1. An object FIFO called "of_in" connects a Shim Tile to a Compute Tile, and another called "of_out" connects the Compute Tile back to the Shim Tile. 
-1. The runtime data movement is expressed to read `4096` uint8_t data from host memory to the compute tile and write the `4096` data back to host memory. 
+1. The runtime data movement is expressed to read `4096` `uint8_t` data from host memory to the compute tile and write the `4096` data back to host memory. 
 1. The compute tile acquires this input data in "object" sized (`1024`) blocks from "of_in" and copies them to another output "object" it has acquired from "of_out". Note that a vectorized kernel running on the Compute Tile's AIE core copies the data from the input "object" to the output "object".
 1. After the vectorized copy is performed, the Compute Tile releases the "objects", allowing the DMAs (abstracted by the object FIFO) to transfer the data back to host memory and copy additional blocks into the Compute Tile,  "of_out" and "of_in" respectively.
 
-It is important to note that the Shim Tile and Compute Tile DMAs move data concurrently, and the Compute Tile's AIE Core also processes data concurrently with the data movement. This is made possible by expressing depth `2` in declaring, for example, `object_fifo("in", ShimTile, ComputeTile2, 2, line_ty)` to denote ping-pong buffers.
+It is important to note that the Shim Tile and Compute Tile DMAs move data concurrently, and the Compute Tile's AIE Core also processes data concurrently with the data movement. This is made possible by expressing depth `2` in declaring the ObjectFifo, for example, `ObjectFifo(line_ty, name="in", default_depth=2)` to denote ping-pong buffers. If `default_depth` is not declared, the default is `2` in reference to this pattern.
 
 ## Design Component Details
 
-### AIE Array Structural Design
+### AIE Array Structural Alternative Design 
 
 This design performs a memcpy operation on a vector of input data. The AIE design is described in a Python module as follows:
 
@@ -66,34 +68,35 @@ This design performs a memcpy operation on a vector of input data. The AIE desig
 
 1. **Vectorized Copying:** The `passThrough_aie()` function processes multiple data elements simultaneously, taking advantage of AIE vector datapath capabilities to load, copy and store data elements.
 
-1. **C-style Wrapper Functions:** `passThroughLine()` and `passThroughTile()` are two C-style wrapper functions to call the templated `passThrough_aie()` vectorized memcpy implementation from the AIE design implemented in `aie2.py`. The `passThroughLine()` and `passThroughTile()` functions are compiled for `uint8_t`, `int16_t`, or `int32_t` determined by the value the `BIT_WIDTH` variable defines. 
+1. **C-style Wrapper Functions:** `passThroughLine()` and `passThroughTile()` are two C-style wrapper functions to call the templated `passThrough_aie()` vectorized memcpy implementation from the AIE design implemented in `passthrough_kernel.py`. The `passThroughLine()` and `passThroughTile()` functions are compiled for `uint8_t`, `int16_t`, or `int32_t` determined by the value the `BIT_WIDTH` variable defines. 
 
 ## Usage
 
-### C++ Testbench
+### Compilation
 
 To compile the design:
 
-```
+```shell
 make
 ```
 
+To compile the alternative design:
+```shell
+env use_alt=1 make
+```
+
+### C++ Testbench
+
 To complete compiling the C++ testbench and run the design:
 
-```
+```shell
 make run
 ```
 
 ### Python Testbench
 
-To compile the design:
-
-```
-make
-```
-
 To run the design:
 
-```
+```shell
 make run_py
 ```
diff --git a/programming_examples/basic/passthrough_kernel/aie2.py b/programming_examples/basic/passthrough_kernel/aie2.py
diff --git a/...les/basic/passthrough_kernel/aie2_iron.py → .../passthrough_kernel/passthrough_kernel.py b/...les/basic/passthrough_kernel/aie2_iron.py → .../passthrough_kernel/passthrough_kernel.py
@@ -1,4 +1,4 @@
-# passthrough_kernel/aie2_iron.py -*- Python -*-
+# passthrough_kernel/passthrough_kernel.py -*- Python -*-
 #
 # This file is licensed under the Apache License v2.0 with LLVM Exceptions.
 # See https://llvm.org/LICENSE.txt for license information.

diff --git a/...ples/basic/passthrough_kernel/aie2_alt.py → ...sthrough_kernel/passthrough_kernel_alt.py b/...ples/basic/passthrough_kernel/aie2_alt.py → ...sthrough_kernel/passthrough_kernel_alt.py
@@ -1,4 +1,4 @@
-# passthrough_kernel/aie2_alt.py -*- Python -*-
+# passthrough_kernel/passthrough_kernel_alt.py -*- Python -*-
 #
 # This file is licensed under the Apache License v2.0 with LLVM Exceptions.
 # See https://llvm.org/LICENSE.txt for license information.

diff --git a/programming_examples/basic/passthrough_kernel/run_makefile_iron.lit b/programming_examples/basic/passthrough_kernel/run_makefile_iron.lit