Skip to content

Commit

Permalink
Merge pull request #616 from harvard-edge/611-update-ai-frameworks-ch…
Browse files Browse the repository at this point in the history
…apter

611 Update AI Frameworks chapter
  • Loading branch information
profvjreddi authored Jan 17, 2025
2 parents 1a86675 + 24630d8 commit 6253bac
Show file tree
Hide file tree
Showing 15 changed files with 1,338 additions and 684 deletions.
2 changes: 1 addition & 1 deletion contents/core/benchmarking/benchmarking.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -264,7 +264,7 @@ The development life cycle of a machine learning model involves two critical pha

From a systems perspective, training machine learning models is resource-intensive, especially when working with large models. These models often contain billions or even trillions of trainable parameters and require enormous amounts of data, often on the scale of many terabytes. For example, [OpenAI's GPT-3](https://arxiv.org/abs/2005.14165) [@brown2020language] has 175 billion parameters, was trained on 45 TB of compressed plaintext data, and required 3,640 petaflop-days of compute for pretraining. ML training benchmarks evaluate the systems and resources required to manage the computational load of training such models.

Efficient data storage and delivery during training also play a major role in the training process. For instance, in a machine learning model that predicts bounding boxes around objects in an image, thousands of images may be required. However, loading an entire image dataset into memory is typically infeasible, so practitioners rely on data loaders (as disucssed in @sec-frameworks-data-loaders) from ML frameworks. Successful model training depends on timely and efficient data delivery, making it essential to benchmark tools like data loaders, data pipelines, preprocessing speed, and storage retrieval times to understand their impact on training performance.
Efficient data storage and delivery during training also play a major role in the training process. For instance, in a machine learning model that predicts bounding boxes around objects in an image, thousands of images may be required. However, loading an entire image dataset into memory is typically infeasible, so practitioners rely on data loaders from ML frameworks. Successful model training depends on timely and efficient data delivery, making it essential to benchmark tools like data loaders, data pipelines, preprocessing speed, and storage retrieval times to understand their impact on training performance.

Hardware selection is another key factor in training machine learning systems, as it can significantly impact training time. Training benchmarks evaluate CPU, GPU, memory, and network utilization during the training phase to guide system optimizations. Understanding how resources are used is essential: Are GPUs being fully leveraged? Is there unnecessary memory overhead? Benchmarks can uncover bottlenecks or inefficiencies in resource utilization, leading to cost savings and performance improvements.

Expand Down
480 changes: 286 additions & 194 deletions contents/core/frameworks/frameworks.bib

Large diffs are not rendered by default.

1,540 changes: 1,051 additions & 489 deletions contents/core/frameworks/frameworks.qmd

Large diffs are not rendered by default.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added contents/core/frameworks/images/png/core_ops.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed contents/core/frameworks/images/png/image1.png
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 6253bac

Please sign in to comment.