-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create Glossary.md #15278
base: main
Are you sure you want to change the base?
Create Glossary.md #15278
Changes from all commits
2ae1f00
aaa866d
c885dcb
8b1e68b
f93f8e2
c572349
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,110 @@ | ||
# Glossary of Terms | ||
## **Tenstorrent Devices:** ## | ||
- **Blackhole -** Tenstorrent's next generation AI compute solution. 140 Tensix Cores. | ||
- **Galaxy -** Tenstorrent's high density rack-mounted AI compute solution, intended for corporations. | ||
- **Grayskull -** First generation Tenstorrent chip and entry level DevKit. 120 Tensix Cores. | ||
- **LoudBox -** Tenstorrent's desktop workstation equipped with Tenstorrent hardware and software. | ||
- **Wormhole -** Second generation Tenstorrent chip. 80 Tensix Cores. | ||
## **Tenstorrent Software:** ## | ||
- **Metal Trace -** Performance optimization feature that removes the overhead of building and dispatching model operations. Metal Trace records dispatch operations to the DRAM, and repeats these command while executing a trace. | ||
- **Runtime -** Bottom of the Metalium software stack. Controls kernel execution and collects results from kernel executions. | ||
- **Tracy -** Tenstorrent's profiling tool. | ||
- **TT-Buda -** A compute framework used to develop, run, and analyze ML workloads on Tenstorrent hardware. | ||
- **TT-Fabric -** | ||
- **TT-Forge -** Tenstorrent’s graph compiler. | ||
- **TT-Metalium -** Tenstorrent’s low-level programming model. | ||
- **TT-MLIR -** Tenstorrent’s MLIR-based compiler. | ||
- **TT-NN -** Tenstorrent’s library of optimized operators. | ||
- **Watcher -** Tenstorrent monitoring tool. | ||
## **Acronyms:** ## | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would group them into meaningful sections. Right now, this list seems really random. |
||
- **ANN -** Artificial Neural Network | ||
- **BFS Algorithm -** Breadth-first search algorithms searches for a node that satisfies specific requirements. | ||
- **BMM -** Batch Matrix Multiplication | ||
- **CCL -** Collective Communication Library - The CCL provides a collection of pre-written multi-chip operations. | ||
- **CI -** Continuous Integration | ||
- **CQ -** Command Queue | ||
- **CNN -** Convolution Neural Network - A deep learning model suited for image processing spatial data. | ||
- **DMA -** Direct Memory Access | ||
- **DNN -** Deep Neural Network | ||
- **DP -** Data Parallel | ||
- **DRAM -** Dynamic Random Access Memory - Off-chip Memory. Provides larger, off-chip storage for the system. | ||
- **EDM -** ERISC Data Mover - Reusable data movement component used to send data over ethernet links. It provides ethernet link utilization and handles design concerns by implementing multi-chip collective operations. | ||
- **ELF -** Executable and Linkable Format - Standard file format for executable files. | ||
- **FDE -** Fetch-decode-execute cycle. | ||
- **FPF -** Floating Point Format | ||
- **FVC -** Fabric Virtual Channel | ||
- **GDB -** GNU Debugger tool. | ||
- **GDDR -** Graphics Double Data Rate - Memory type used for GPUs. | ||
- **GH -** GitHub | ||
- **ISA -** Instruction-Set Architecture - An abstract computer model that determines how the CPU is controlled by the software. | ||
- **ISS -** Instruction Set Simulator | ||
- **KWARG -** Keyword Argument. | ||
- **LLK -** Low Level Kernels - Performs math operations on Tensix cores using Tenstorrent hardware efficiently. | ||
- **LLM -** Large Language Model | ||
- **LPDDR4 -** DRAM - Off-chip Memory. Provides larger, off-chip storage for the system. | ||
- **LUT -** Look Up Table - Mapping table of input and output values used for addressing. | ||
- **MLP -** Multilayer Perceptron - A modern feedforward artificial neural network. | ||
- **MLIR -** Multi-Level Intermediate Representation | ||
- **MM - MatMul - Matrix Multiplication -** Multiplication of two square or rectangular matrices. | ||
- **MIMD -** Multiple Instruction Multiple Data - Allows multiple processors to function simultaneously and asynchronously. | ||
- **NM -** Nanometers | ||
- **NMC -** Near Memory Computing - Computing paradigm that moves data processing closer to where the data is stored. | ||
- **NLP -** Natural Language Processing | ||
- **NoC -** Network on Chip | ||
- **NPU -** Neural Processing Unit | ||
- **NUMA -** Non-uniform Memory Access | ||
- **PCC -** Pearson Correlation Coefficient - A method of measuring a linear correlation between two variables. | ||
- **Q, K, V values -** Q, K, and V stand for different values. Q = query, K = key, V = value. In transformer architecture these values are used to help the model focus on relevant parts of the input sequence. | ||
- **SFPU -** Tensix SIMD engine, used for various activation operations like exponents, square roots, softmax, topK, etc. | ||
- **SIMD -** Single Instruction Multiple Data - Allows processors to execute the same instruction on multiple data points simultaneously | ||
- **SoC -** System on Chip | ||
- **SRAM -** Static Random Access Memory. On-chip Memory. 1 MB SRAM memory (L1) on the Grayskull and 1.5 MB SRAM memory (L1) on the Wormhole. Can be used as a scratch pad accessible by all RISC-V processors and engines within the core. | ||
- **TM -** Tensor Manipulation | ||
- **TP -** Tensor Parallel | ||
- **ViT -** Vision Transformer - A transformer model used for vision processing tasks. | ||
## **Terminology** ## | ||
- **Convolution -** Multiplication of two functions’ values to create a new function. | ||
- **DPRINT -** Configurable code formatting platform. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is our kernel print command There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is this comment marked as resolved? |
||
- **dtype -** Data Type. | ||
- **Huge Page -** Memory management technique used to request larger memory blocks. | ||
- **Kernel Types -** Kernels send and receive data from tensors. | ||
- **Bare Metal C/C++ Kernels -** Kernels written in C or C++ for execution. Generally used to accelerate computation and processing performance. | ||
- **User Kernel Types -** Kernels are in user mode when operating in applications. | ||
- **Compute Kernels -** Kernels used for processing tasks or operations. Compute kernels will automatically generate the following types of kernels: | ||
- **Unpack Kernels -** Unpack kernels prepare data for operations to be performed by the math kernel. | ||
- **Math Kernels -** Kernels used for matrix multiplication and other mathematical tasks or operations. | ||
- **Pack Kernels -** Pack kernels wait for the end of the math kernel and prepare data to be moved to the next part of the system. | ||
- **Data Movement Kernels _** The first and fifth RISC-Vs on a Tensix Core responsible for moving data between NoCs, memory buffers, and the compute kernel. | ||
- **Reader Kernel -** Kernel receiving data from a DRAM or SRAM buffer. | ||
- **Writer Kernel -** Kernel sending data to a DRAM or SRAM buffer. | ||
- **Ethernet Data Movement Kernels -** Kernels responsible for moving data between cores. | ||
- **Dispatch Kernels -** Kernels that determine where data is dispatched depending on the data’s priority and dispatch key. | ||
- **Low-Level Kernels -** Performs math operations on Tensix cores using Tenstorrent hardware efficiently. | ||
- **Mantissa -** Floating point number that represents significant digits of the number. | ||
- **Memory Types -** Memory can be Interleaved or Sharded and can be further categorized into height, width, and block sharding. | ||
- **Interleaved Memory -** L1 interleaved distributes across all L1 memory banks sequentially. | ||
- **Sharded Memory -** L1 or DRAM sharded lays out tensors based user provided shard specifications. | ||
- **Height Sharding -** Memory split across dimension 0. | ||
- **Width Sharding -** Memory split across dimension 1. | ||
- **Block Sharding -** Memory split across dimension 0 and dimension 1. | ||
- **Memory Barrier -** Parallelization technique where operations are required to complete before moving on to other operations. | ||
- **Mesh -** 2D chip architecture. | ||
- **Mixed Precision -** Optimization technique using floating-point types in a model to enhance performance and reduce memory usage. Optimization technique using floating-point types in a model to enhance performance and reduce memory usage. TT-Metalium supports BFLOAT8/4/2_B data formats. | ||
- **Multicast -** Write to multiple cores in the same operation. Tenstorrent hardware natively supports NoC multicasts. | ||
- **PyTorch -** Optimized tensor library. | ||
- **RISC-V -** Processors (Baby RISCVs) that run C/C++ kernels and dispatch instructions to compute and data movement engines. | ||
- **Semaphore -** A variable or abstract data type used for access control. All semaphores use atomic operations. | ||
- **Tensix Core -** Processor cores that make up Tensix processors. Tensix cores contain the following components: | ||
- **Matrix Engine -** Performs matrix multiplication, elementwise, and dot product operations on small matrices (tiles) of shape 32x32 or similar. | ||
- **Tile Math Engine -** Handles standard algebraic operations. | ||
- **Vector Math Engine -** Handles vectorized kernels such as Top-k, Sort, and special functions like GELU, Exp, and Sqrt. | ||
- **Tensor -** Data structure that can hold data in multiple dimensions. Tensors can be interleaved or sharded, have different data types, and can be stored on L1 or DRAM. Tensors can be organized in a Row-Major or Tiled Layout: | ||
- **Row-Major Tensor Layout -** Contiguously stored data. | ||
- **Tiled Tensor Layout -** Data organized into tile-sized faces. | ||
- **Tensor Shape -** Logical representation of user data. | ||
- **TFLOPS -** Teraflops | ||
- **Topology -** Physical organization of a device. | ||
- **Linear Topology -** All devices or components are connected to a common source. | ||
- **Mesh Topology -** Network configuration where all devices or components are connected to each other. | ||
- **Torus Topology -** Interconnect switchless network configuration connecting processors used often for parallel systems. | ||
Comment on lines
+106
to
+109
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's not how different topologies work. Also you can't organize one device. What you are describing is a star, fully connected and layered topo, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is this close? It's still wrong and not updated |
||
- **Unicast -** Allows processors to execute the same instruction on multiple data points simultaneously. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should put the different names for harvested chips as well