diff --git a/tt_metal/Glossary.md b/tt_metal/Glossary.md new file mode 100644 index 00000000000..59f1070f313 --- /dev/null +++ b/tt_metal/Glossary.md @@ -0,0 +1,110 @@ +# Glossary of Terms +## **Tenstorrent Devices:** ## +- **Blackhole -** Tenstorrent's next generation AI compute solution. 140 Tensix Cores. +- **Galaxy -** Tenstorrent's high density rack-mounted AI compute solution, intended for corporations. +- **Grayskull -** First generation Tenstorrent chip and entry level DevKit. 120 Tensix Cores. +- **LoudBox -** Tenstorrent's desktop workstation equipped with Tenstorrent hardware and software. +- **Wormhole -** Second generation Tenstorrent chip. 80 Tensix Cores. +## **Tenstorrent Software:** ## +- **Metal Trace -** Performance optimization feature that removes the overhead of building and dispatching model operations. Metal Trace records dispatch operations to the DRAM, and repeats these command while executing a trace. +- **Runtime -** Bottom of the Metalium software stack. Controls kernel execution and collects results from kernel executions. +- **Tracy -** Tenstorrent's profiling tool. +- **TT-Buda -** A compute framework used to develop, run, and analyze ML workloads on Tenstorrent hardware. +- **TT-Fabric -** +- **TT-Forge -** Tenstorrent’s graph compiler. +- **TT-Metalium -** Tenstorrent’s low-level programming model. +- **TT-MLIR -** Tenstorrent’s MLIR-based compiler. +- **TT-NN -** Tenstorrent’s library of optimized operators. +- **Watcher -** Tenstorrent monitoring tool. +## **Acronyms:** ## +- **ANN -** Artificial Neural Network +- **BFS Algorithm -** Breadth-first search algorithms searches for a node that satisfies specific requirements. +- **BMM -** Batch Matrix Multiplication +- **CCL -** Collective Communication Library - The CCL provides a collection of pre-written multi-chip operations. +- **CI -** Continuous Integration +- **CQ -** Command Queue +- **CNN -** Convolution Neural Network - A deep learning model suited for image processing spatial data. +- **DMA -** Direct Memory Access +- **DNN -** Deep Neural Network +- **DP -** Data Parallel +- **DRAM -** Dynamic Random Access Memory - Off-chip Memory. Provides larger, off-chip storage for the system. +- **EDM -** ERISC Data Mover - Reusable data movement component used to send data over ethernet links. It provides ethernet link utilization and handles design concerns by implementing multi-chip collective operations. +- **ELF -** Executable and Linkable Format - Standard file format for executable files. +- **FDE -** Fetch-decode-execute cycle. +- **FPF -** Floating Point Format +- **FVC -** Fabric Virtual Channel +- **GDB -** GNU Debugger tool. +- **GDDR -** Graphics Double Data Rate - Memory type used for GPUs. +- **GH -** GitHub +- **ISA -** Instruction-Set Architecture - An abstract computer model that determines how the CPU is controlled by the software. +- **ISS -** Instruction Set Simulator +- **KWARG -** Keyword Argument. +- **LLK -** Low Level Kernels - Performs math operations on Tensix cores using Tenstorrent hardware efficiently. +- **LLM -** Large Language Model +- **LPDDR4 -** DRAM - Off-chip Memory. Provides larger, off-chip storage for the system. +- **LUT -** Look Up Table - Mapping table of input and output values used for addressing. +- **MLP -** Multilayer Perceptron - A modern feedforward artificial neural network. +- **MLIR -** Multi-Level Intermediate Representation +- **MM - MatMul - Matrix Multiplication -** Multiplication of two square or rectangular matrices. +- **MIMD -** Multiple Instruction Multiple Data - Allows multiple processors to function simultaneously and asynchronously. +- **NM -** Nanometers +- **NMC -** Near Memory Computing - Computing paradigm that moves data processing closer to where the data is stored. +- **NLP -** Natural Language Processing +- **NoC -** Network on Chip +- **NPU -** Neural Processing Unit +- **NUMA -** Non-uniform Memory Access +- **PCC -** Pearson Correlation Coefficient - A method of measuring a linear correlation between two variables. +- **Q, K, V values -** Q, K, and V stand for different values. Q = query, K = key, V = value. In transformer architecture these values are used to help the model focus on relevant parts of the input sequence. +- **SFPU -** Tensix SIMD engine, used for various activation operations like exponents, square roots, softmax, topK, etc. +- **SIMD -** Single Instruction Multiple Data - Allows processors to execute the same instruction on multiple data points simultaneously +- **SoC -** System on Chip +- **SRAM -** Static Random Access Memory. On-chip Memory. 1 MB SRAM memory (L1) on the Grayskull and 1.5 MB SRAM memory (L1) on the Wormhole. Can be used as a scratch pad accessible by all RISC-V processors and engines within the core. +- **TM -** Tensor Manipulation +- **TP -** Tensor Parallel +- **ViT -** Vision Transformer - A transformer model used for vision processing tasks. +## **Terminology** ## +- **Convolution -** Multiplication of two functions’ values to create a new function. +- **DPRINT -** Configurable code formatting platform. +- **dtype -** Data Type. +- **Huge Page -** Memory management technique used to request larger memory blocks. +- **Kernel Types -** Kernels send and receive data from tensors. + - **Bare Metal C/C++ Kernels -** Kernels written in C or C++ for execution. Generally used to accelerate computation and processing performance. + - **User Kernel Types -** Kernels are in user mode when operating in applications. + - **Compute Kernels -** Kernels used for processing tasks or operations. Compute kernels will automatically generate the following types of kernels: + - **Unpack Kernels -** Unpack kernels prepare data for operations to be performed by the math kernel. + - **Math Kernels -** Kernels used for matrix multiplication and other mathematical tasks or operations. + - **Pack Kernels -** Pack kernels wait for the end of the math kernel and prepare data to be moved to the next part of the system. + - **Data Movement Kernels _** The first and fifth RISC-Vs on a Tensix Core responsible for moving data between NoCs, memory buffers, and the compute kernel. + - **Reader Kernel -** Kernel receiving data from a DRAM or SRAM buffer. + - **Writer Kernel -** Kernel sending data to a DRAM or SRAM buffer. + - **Ethernet Data Movement Kernels -** Kernels responsible for moving data between cores. + - **Dispatch Kernels -** Kernels that determine where data is dispatched depending on the data’s priority and dispatch key. + - **Low-Level Kernels -** Performs math operations on Tensix cores using Tenstorrent hardware efficiently. +- **Mantissa -** Floating point number that represents significant digits of the number. +- **Memory Types -** Memory can be Interleaved or Sharded and can be further categorized into height, width, and block sharding. + - **Interleaved Memory -** L1 interleaved distributes across all L1 memory banks sequentially. + - **Sharded Memory -** L1 or DRAM sharded lays out tensors based user provided shard specifications. + - **Height Sharding -** Memory split across dimension 0. + - **Width Sharding -** Memory split across dimension 1. + - **Block Sharding -** Memory split across dimension 0 and dimension 1. +- **Memory Barrier -** Parallelization technique where operations are required to complete before moving on to other operations. +- **Mesh -** 2D chip architecture. +- **Mixed Precision -** Optimization technique using floating-point types in a model to enhance performance and reduce memory usage. Optimization technique using floating-point types in a model to enhance performance and reduce memory usage. TT-Metalium supports BFLOAT8/4/2_B data formats. +- **Multicast -** Write to multiple cores in the same operation. Tenstorrent hardware natively supports NoC multicasts. +- **PyTorch -** Optimized tensor library. +- **RISC-V -** Processors (Baby RISCVs) that run C/C++ kernels and dispatch instructions to compute and data movement engines. +- **Semaphore -** A variable or abstract data type used for access control. All semaphores use atomic operations. +- **Tensix Core -** Processor cores that make up Tensix processors. Tensix cores contain the following components: + - **Matrix Engine -** Performs matrix multiplication, elementwise, and dot product operations on small matrices (tiles) of shape 32x32 or similar. + - **Tile Math Engine -** Handles standard algebraic operations. + - **Vector Math Engine -** Handles vectorized kernels such as Top-k, Sort, and special functions like GELU, Exp, and Sqrt. +- **Tensor -** Data structure that can hold data in multiple dimensions. Tensors can be interleaved or sharded, have different data types, and can be stored on L1 or DRAM. Tensors can be organized in a Row-Major or Tiled Layout: + - **Row-Major Tensor Layout -** Contiguously stored data. + - **Tiled Tensor Layout -** Data organized into tile-sized faces. +- **Tensor Shape -** Logical representation of user data. +- **TFLOPS -** Teraflops +- **Topology -** Physical organization of a device. + - **Linear Topology -** All devices or components are connected to a common source. + - **Mesh Topology -** Network configuration where all devices or components are connected to each other. + - **Torus Topology -** Interconnect switchless network configuration connecting processors used often for parallel systems. +- **Unicast -** Allows processors to execute the same instruction on multiple data points simultaneously.