tinygrad has four pieces
- frontend (Tensor -> LazyBuffer)
- See tensor.py, function.py, multi.py, and lazy.py
- The user interacts with the Tensor class
- This outputs LazyBuffers, which form the simple compute graph
- scheduler (LazyBuffer -> ScheduleItem)
- See engine/schedule.py
- When a Tensor is realized, the scheduler is run to get its LazyBuffers to be computed
- This takes in LazyBuffers and groups them as appropriate into kernels.
- It returns a list of ScheduleItems + all the Variables used in the graph
- lowering (TODO: lots of work to clean this up still)
- See codegen/ (ScheduleItem.ast -> UOps)
- ScheduleItems have an ast that's compiled into actual GPU code
- Many optimization choices can be made here, this contains a beam search.
- renderer/compiler (UOps -> machine code)
- UOps are tinygrad's IR, similar to LLVM IR
- Here we either convert them to a high level language or machine code directly
- engine/realize.py (ScheduleItem -> ExecItem)
- See codegen/ (ScheduleItem.ast -> UOps)
- runtime
- See runtime/
- Runtime actually interacts with the GPUs
- It manages Buffers, Programs, and Queues
- Sadly, METAL and GPU (OpenCL) don't have a compiler that can be pulled out from the device itself