Folders and files Name Name Last commit message
Last commit date
parent directory
View all files
Week 3: Training optimizations, profiling DL code
Blog post about reduced precision FP formats
NVIDIA blog posts about mixed precision training with Tensor Cores , Tensor Core performance tips , TF32 Tensor Cores
Presentations about Tensor Cores: one , two , three
Tensor Core Requirements and Mixed Precision Training sections of the NVIDIA DL performance guide
Automatic Mixed Precision in PyTorch
TF32 section of PyTorch CUDA docs
AMP , FP16 and BF16 in DeepSpeed
PyTorch Performance Tuning Guide
Latency Numbers Every Programmer Should Know
Pillow Performance benchmarks
Faster Image Processing tips from fastai docs
Rapid Data Pre-Processing with NVIDIA DALI
General-purpose Python profilers: builtins (cProfile and profile) , pyinstrument , memory_profiler , py-spy , Scalene
DLProf user guide
How to profile with DLProf
Profiling and Optimizing Deep Neural Networks with DLProf and PyProf
NVIDIA presentations on profiling DL networks , profiling for DL and mixed precision
Profiling Deep Learning Workloads
PyTorch Profiler and PyTorch Profiler with TensorBoard tutorial
torch.utils.bottleneck quick guide
PyTorch Autograd profiler tutorial
Nsight Systems and Nsight Compute user guides
Video tutorial about speeding up and profiling neural networks
You can’t perform that action at this time.