Skip to content

Commit

Permalink
Change vec_scal_add examples to vec_scal_mul and cleaned up README re…
Browse files Browse the repository at this point in the history
…ferences (#1400)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
  • Loading branch information
jackl-xilinx and github-actions[bot] authored Apr 24, 2024
1 parent dba94a4 commit 32127bf
Show file tree
Hide file tree
Showing 29 changed files with 379 additions and 1,685 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -100,4 +100,4 @@ clean: clean_trace

.PHONY: clean_trace
clean_trace:
rm -rf tmpTrace parse*.json
rm -rf tmpTrace parse*.json trace.txt
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 6 additions & 0 deletions programming_guide/quick_reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,12 @@
| `print(ctx.module)` | Converts our ctx wrapped structural code to mlir and prints to stdout|
| `ctx.module.operation.verify()` | Runs additional structural verficiation on the python binded source code and return result to stdout |

## Common AIE API functions for Kernel Programming
| Function Signature | Definition | Parameters | Return Type | Example |
|---------------------|------------|------------|-------------|---------|
| `aie::vector<T, vec_factor> my_vector` | Declare vector type | `T`: data type <br> `vec_factor`: vector width | n/a | aie::vector<int16_t, 32> my_vector; |
| `aie::load_v<vec_factor>(pA1);` | Vector load | `vec_factor`: vector width | `aie::vector` | aie::vector<int16_t, 32> my_vector; |

## Helpful AI Engine Architecture References and Tables
* [AIE2 - Table of supported data types and vector sizes (AIE API)](https://www.xilinx.com/htmldocs/xilinx2023_2/aiengine_api/aie_api/doc/group__group__basic__types.html)

Expand Down
2 changes: 1 addition & 1 deletion programming_guide/section-1/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
#
##===----------------------------------------------------------------------===##

include ../../tutorials/makefile-common
include ../../programming_examples/makefile-common

build/aie.mlir: aie2.py
mkdir -p ${@D}
Expand Down
6 changes: 5 additions & 1 deletion programming_guide/section-3/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,15 @@ all: build/final.xclbin build/insts.txt

targetname = vectorScalar

build/aie.mlir: aie2.py
mkdir -p ${@D}
python3 $< > $@

build/scale.o: vector_scalar_mul.cc
mkdir -p ${@D}
cd ${@D} && xchesscc_wrapper ${CHESSCCWRAP2_FLAGS} -c $(<:%=../%) -o ${@F}

build/final.xclbin: aie.mlir build/kernel1.o build/kernel2.o build/kernel3.o
build/final.xclbin: build/aie.mlir build/scale.o
mkdir -p ${@D}
cd ${@D} && aiecc.py --aie-generate-cdo --no-compile-host --xclbin-name=${@F} \
--aie-generate-npu --npu-insts-name=insts.txt $(<:%=../%)
Expand Down
1 change: 0 additions & 1 deletion programming_guide/section-3/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,7 +149,6 @@ To compile the design and C++ testbench:
```sh
make
make build/vectorScalar.exe
```

To run the design:
Expand Down
14 changes: 3 additions & 11 deletions programming_guide/section-3/test.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -34,13 +34,10 @@ int main(int argc, const char *argv[]) {

test_utils::parse_options(argc, argv, desc, vm);
int verbosity = vm["verbosity"].as<int>();
int trace_size = vm["trace_sz"].as<int>();

constexpr bool VERIFY = true;
constexpr bool ENABLE_TRACING = false;
// constexpr int TRACE_SIZE = 8192;
constexpr int IN_SIZE = 4096;
constexpr int OUT_SIZE = ENABLE_TRACING ? IN_SIZE + trace_size / 4 : IN_SIZE;
constexpr int OUT_SIZE = IN_SIZE;

// Load instruction sequence
std::vector<uint32_t> instr_v =
Expand All @@ -64,7 +61,7 @@ int main(int argc, const char *argv[]) {
XRT_BO_FLAGS_HOST_ONLY, kernel.group_id(2));
auto bo_inFactor = xrt::bo(device, 1 * sizeof(DATATYPE),
XRT_BO_FLAGS_HOST_ONLY, kernel.group_id(3));
auto bo_outC = xrt::bo(device, OUT_SIZE * sizeof(DATATYPE) + trace_size,
auto bo_outC = xrt::bo(device, OUT_SIZE * sizeof(DATATYPE),
XRT_BO_FLAGS_HOST_ONLY, kernel.group_id(4));

if (verbosity >= 1)
Expand All @@ -85,7 +82,7 @@ int main(int argc, const char *argv[]) {

// Zero out buffer bo_outC
DATATYPE *bufOut = bo_outC.map<DATATYPE *>();
memset(bufOut, 0, OUT_SIZE * sizeof(DATATYPE) + trace_size);
memset(bufOut, 0, OUT_SIZE * sizeof(DATATYPE));

// sync host to device memories
bo_instr.sync(XCL_BO_SYNC_BO_TO_DEVICE);
Expand Down Expand Up @@ -120,11 +117,6 @@ int main(int argc, const char *argv[]) {
}
}

if (trace_size > 0) {
test_utils::write_out_trace(((char *)bufOut) + (IN_SIZE * sizeof(DATATYPE)),
trace_size, vm["trace_file"].as<std::string>());
}

// Print Pass/Fail result of our test
if (!errors) {
std::cout << std::endl << "PASS!" << std::endl << std::endl;
Expand Down
8 changes: 1 addition & 7 deletions programming_guide/section-3/test.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@
from aie.extras.dialects.ext import memref, arith

import aie.utils.test as test_utils
import aie.utils.trace as trace_utils


def main(opts):
Expand All @@ -41,7 +40,7 @@ def main(opts):
INOUT1_SIZE = INOUT1_VOLUME * INOUT1_DATATYPE().itemsize
INOUT2_SIZE = INOUT2_VOLUME * INOUT2_DATATYPE().itemsize

OUT_SIZE = INOUT2_SIZE + int(opts.trace_size)
OUT_SIZE = INOUT2_SIZE

# ------------------------------------------------------
# Get device, load the xclbin & kernel and register them
Expand Down Expand Up @@ -99,11 +98,6 @@ def main(opts):
e = np.equal(output_buffer, ref)
errors = errors + np.size(e) - np.count_nonzero(e)

# Write trace values if trace_size > 0
if opts.trace_size > 0:
trace_buffer = entire_buffer[INOUT2_VOLUME:]
trace_utils.write_out_trace(trace_buffer, str(opts.trace_file))

# ------------------------------------------------------
# Print verification and timing results
# ------------------------------------------------------
Expand Down
70 changes: 0 additions & 70 deletions programming_guide/section-4/CMakeLists.txt

This file was deleted.

74 changes: 0 additions & 74 deletions programming_guide/section-4/aie2.py

This file was deleted.

10 changes: 7 additions & 3 deletions programming_guide/section-4/section-4a/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,14 @@ build/aie.mlir: aie2.py
mkdir -p ${@D}
python3 $< > $@

build/final.xclbin: build/aie.mlir
build/scale.o: vector_scalar_mul.cc
mkdir -p ${@D}
cd ${@D} && aiecc.py --aie-generate-cdo --aie-generate-npu --no-compile-host \
--xclbin-name=${@F} --npu-insts-name=insts.txt ${<F}
cd ${@D} && xchesscc_wrapper ${CHESSCCWRAP2_FLAGS} -c $(<:%=../%) -o ${@F}

build/final.xclbin: build/aie.mlir build/scale.o
mkdir -p ${@D}
cd ${@D} && aiecc.py --aie-generate-cdo --no-compile-host --xclbin-name=${@F} \
--aie-generate-npu --npu-insts-name=insts.txt $(<:%=../%)

${targetname}.exe: test.cpp
rm -rf _build
Expand Down
5 changes: 1 addition & 4 deletions programming_guide/section-4/section-4a/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ Adding the application timer is as simple as noting a start and stop time surrou

```c++
auto start = std::chrono::high_resolution_clock::now();
auto run = kernel(bo_instr, instr_v.size(), bo_inout0, bo_inout1, bo_inout2);
auto run = kernel(bo_instr, instr_v.size(), bo_inA, bo_inFactor, bo_outC);
run.wait();
auto stop = std::chrono::high_resolution_clock::now();

Expand Down Expand Up @@ -78,9 +78,6 @@ We can then compute and print the actual average, minimum and maximum run times.

1. Let's set our iterations to 10 and run again with `make run` which recompiles our host code for `test.cpp`. What reported Avg NPU time do you see this time? <img src="../../../mlir_tutorials/images/answer1.jpg" title="Answer can be anywhere from 430-480us but is likely different than before" height=25>

1. Let's change our design and increase the loop size of our kernel by a factor of 10. This involves changing the outer loop from 8 to 80. What reported times do you see now? <img src="../../../mlir_tutorials/images/answer1.jpg" title="? us" height=25>


-----
[[Up]](../../section-4) [[Next]](../section-4b)

Loading

0 comments on commit 32127bf

Please sign in to comment.