Skip to content

Commit

Permalink
Pjr ex docs (#1282)
Browse files Browse the repository at this point in the history
Co-authored-by: pjr <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
  • Loading branch information
3 people authored and fifield committed Apr 17, 2024
1 parent bb013b2 commit 7d4b1e6
Show file tree
Hide file tree
Showing 2 changed files with 43 additions and 8 deletions.
36 changes: 32 additions & 4 deletions programming_examples/basic/vector_exp/README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,41 @@
<!---//===- README.md --------------------------*- Markdown -*-===//
<!---//===- README.md -----------------------------------------*- Markdown -*-===//
//
// This file is licensed under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
// Copyright (C) 2022, Advanced Micro Devices, Inc.
// Copyright (C) 2024, Advanced Micro Devices, Inc.
//
//===----------------------------------------------------------------------===//-->

# <ins>Eltwise Exp</ins>
# Eltwise exp

This example shows how the look up table capability of the AIE can be used to perform approximations to well known functions like e^x.
This design uses 4 cores, and each core operates on `1024 bfloat16` numbers. Each core contains a lookup table approximation of the e^x function, which is then used to perform the e^x operation.
e^x is typically used in machine learning applications with relatively small numbers, typically around 0..1, and also will return infinity for input values larger than 89, so a small look up table approximation method is often accurate enough compared to a more exact approximation like Taylor series expansion.

## Source Files Overview

1. `aie2.py`: A Python script that defines the AIE array structural design using MLIR-AIE operations. This generates MLIR that is then compiled using `aiecc.py` to produce design binaries (ie. XCLBIN and inst.txt for the NPU in Ryzen AI).

1. `bf16_exp.cc`: A C++ implementation of vectorized table lookup operations for AIE cores. The lookup operation `getExpBf16` operates on vectors of size `16` loading the vectorized accumulator registers with the look up table results. It is then necessary to copy the accumulator register to a regular vector register, before storing back into memory. The source can be found [here](../../../aie_kernels/aie2/bf16_exp.cc).

1. `test.cpp`: This C++ code is a testbench for the design example. The code is responsible for loading the compiled XCLBIN file, configuring the AIE module, providing input data, and executing the AIE design on the NPU. After executing, the script verifies the memcpy results and optionally outputs trace data.


## Usage

### C++ Testbench

To compile the design and C++ testbench:

```
make
```

To run the design:

```
make run
```

A simple element wise exponent function, using the look up table capabilities of the AI Engine
15 changes: 11 additions & 4 deletions programming_examples/basic/vector_exp/test.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,12 @@ int verify(int CSize, std::vector<T> A, std::vector<T> C, int verbosity) {
int errors = 0;
for (uint32_t i = 0; i < CSize; i++) {
std::bfloat16_t ref = exp(A[i]);
// Let's check if they are inf or nan, and if so just pass because
// comparisions will then fail, even for matches
if (isinf(ref) || isinf(C[i]))
break;
if (isnan(ref) || isnan(C[i]))
break;
if (!test_utils::nearly_equal(ref, C[i], 0.0078125)) {
std::cout << "Error in output " << C[i] << " != " << ref << std::endl;
errors++;
Expand Down Expand Up @@ -152,10 +158,11 @@ int main(int argc, const char *argv[]) {
// Initialize Inout buffer 0
INOUT0_DATATYPE *bufInOut0 = bo_inout0.map<INOUT0_DATATYPE *>();
std::vector<INOUT0_DATATYPE> AVec(INOUT0_VOLUME);
for (int i = 0; i < INOUT0_VOLUME; i++)
AVec[i] = test_utils::random_bfloat16_t((std::bfloat16_t)2.0,
(std::bfloat16_t)-1.0);

for (int i = 0; i < INOUT0_VOLUME; i++) {
std::uint16_t u16 = (std::uint16_t)i;
std::bfloat16_t bf16 = *(std::bfloat16_t *)&u16;
AVec[i] = bf16;
}
memcpy(bufInOut0, AVec.data(), (AVec.size() * sizeof(INOUT0_DATATYPE)));

// Sync buffers to update input buffer values
Expand Down

0 comments on commit 7d4b1e6

Please sign in to comment.