Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated formatting for ml-frameworks #28

Merged
merged 24 commits into from
Oct 24, 2023

Conversation

DivyaAmirtharaj
Copy link
Contributor

No description provided.

Co-authored-by: Henry Bae <[email protected]>
Co-authored-by: Sophia Cho <[email protected]>
Co-authored-by: Henry Bae <[email protected]>
Co-authored-by: Sophia Cho <[email protected]>
Co-authored-by: Matthew Steward <[email protected]>
Co-authored-by: Vijay Janapa Reddi <[email protected]>
Co-authored-by: Emeka Ezike <[email protected]>
@profvjreddi
Copy link
Contributor

Great job ML frameowrks team on getting to the first draft stage! 👏 👏 👏

@@ -1,119 +1,1785 @@
# AI Frameworks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving Comment to Test Co-Authorship

Co-authored-by: Henry Bae <[email protected]>
Co-authored-by: Sophia Cho <[email protected]>
Co-authored-by: Matthew Stewart <[email protected]>
Co-authored-by: Vijay Janapa Reddi <[email protected]>
Co-authored-by: Emeka Ezike <[email protected]>
Co-authored-by: Henry Bae <[email protected]>
Co-authored-by: Sophia Cho <[email protected]>
Co-authored-by: Matthew Stewart <[email protected]>
Co-authored-by: Vijay Janapa Reddi <[email protected]>
Co-authored-by: Emeka Ezike <[email protected]>
@sophiacho1
Copy link
Contributor

Testing

@profvjreddi
Copy link
Contributor

Great job on pulling this draft together @DivyaAmirtharaj @sophiacho1 @BaeHenryS and Emeka.

Let the feedback begin! 😄

AI techniques. A few decades ago, building and training machine learning
models required extensive low-level coding and infrastructure. Machine
learning frameworks have evolved considerably over the past decade to
meet the expanding needs of practitioners and rapid advances in deep
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paragraph could be reorganized to be clearer and shorter. For example, these two sentences are very similar and could be condensed: "Machine learning frameworks have evolved significantly over time to meet the diverse needs of machine learning practitioners and advancements in AI techniques." and "Machine learning frameworks have evolved considerably over the past decade to meet the expanding needs of practitioners and rapid advances in deep learning techniques"

frameworks.qmd Outdated
in parallel GPU computing unlocked the potential for far deeper neural
networks.

The first ML frameworks, Theano (2007) and Caffe (2014), were developed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these citations properly linked to corresponding BibTex entries?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe @DivyaAmirtharaj @sophiacho1 @BaeHenryS Emeka are still working on those.

Chainer, and CNTK.

Frameworks like Theano and Caffe used static computational graphs which
required rigidly defining the full model architecture upfront. Static
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might help to add brief explanations of what a computational graph is, what you would be declaring upfront, whether "on-the-fly" means "during training" or "during inference" or just "during runtime," etc.

graphs require upfront declaration and limit flexibility. Dynamic graphs
construct on-the-fly for more iterative development. But around 2016,
frameworks began adopting dynamic graphs like PyTorch and TensorFlow 2.0
which can construct graphs on-the-fly. This provides greater flexibility
Copy link
Contributor

@alex-oesterling alex-oesterling Oct 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Dynamic graphs
construct on-the-fly for more iterative development. But around 2016,
frameworks began adopting dynamic graphs like PyTorch and TensorFlow 2.0
which can construct graphs on-the-fly."

These sentences are difficult to read, I would suggest combining them + some tweaks: "But around 2016, frameworks such as Pytorch and TF 2.0 began adopting dynamic graphs which are constructed on-the-fly for more iterative development."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for providing this suggestion on how to rework things. That makes things so much easier to integrate.

processing. Prefetching, on the other hand, involves preloading
subsequent batches, ensuring that the model never idles waiting for
data.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This data loaders section is great, but I think it can be consolidated a bit as some parts are a bit repetitive and it was split up slightly un-intuitively. From what I can tell the main points are that dataloaders (1) read the data from disk or memory, (2) allow for parallel computation on GPU via batching, (3) allow for shuffling (and I think augmentation should be mentioned in this part even though it is the next section), and (4) cacheing for efficiency. I think that points 1 and 4 can be consolidated, as well as 2 and 3 to simply say that dataloaders handle the fetching and cacheing of the inputs to the model, and can be used to batch, shuffle, and augment data before sending it through the model.

- Precision-recall curves - Assess classification tradeoffs.

Tools like TensorBoard (TensorFlow) and TensorWatch (PyTorch) enable
real-time metrics and visualization during training.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weights and Biases may also be a good reference for this! W&B link

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great suggestion, we should definitely add that.

@harvard-edge harvard-edge deleted a comment from sophiacho1 Oct 15, 2023
Copy link
Contributor

@jared-ni jared-ni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome text!

critical areas: loss functions, optimization algorithms, and
regularization techniques.

Loss Functions are useful to quantify the difference between the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be great to illustrate this with the image of a common mathematical loss function and explain the different components. It would strengthen this paragraph and make it more informative (since we probably shouldn't assume too much prior knowledge in a potential all-in-one textbook, though the topic may seem simple to people in the field).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. thank you!

frameworks.qmd Outdated
frameworks come equipped with efficient implementations of several
optimization algorithms, many of which are variants of gradient descent
algorithms with stochastic methods and adaptive learning rates. More
information can be found in the AI training section.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It helps to illustrate the loss function and algorithms like gradient descent stochastic methods or learning rates with some equations. Again, I think this part is assuming too much prior knowledge, and those who are not familiar with the buzz words might not understand this paragraph.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

There will be an AI training chapter that will be coming in before this chapter; we are working on it (just not within the class context), and our hope is that that chapter would have set up the necessary background that would then flow into this chapter. but good feedback though. Thank you.

data (see Overfitting). To counteract this, regularization methods are
employed to penalize model complexity and encourage it to learn simpler
patterns. Dropout for instance randomly sets a fraction of input units
to 0 at each update during training, which helps prevent overfitting.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the paragraph explains overfitting, wouldn't it also be helpful to touch on underfitting, and in what situations either case may be encountered? I think it would be informative to add examples with potential graphs to illustrate this point.

graph is the first step. It represents all the mathematical operations
and data flow within the model. We discussed this earlier.

During training, the focus is on executing the computational graph.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Computational graph might be familiar to people who are exposed to AI/ML, but it doesn't mean much to people who aren't experienced in the field. Since this textbook covers this topic, I think a graphical representation of a computational graph would be good to include here!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed this earlier. Thanks!

be trained with many different datasets (which, in our example, would be
the set of images that are on personal devices), without the need to
transfer a large amount of potentially sensitive data. However,
federated learning also comes with a series of challenges.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great paragraph on Federated Learning! I think a graph to illustrate how federated learning works might also be cool.

Frameworks specifically built for specialized hardware like CMSIS-NN on
Cortex-M processors can further maximize performance, but sacrifice
portability. Integrated frameworks from processor vendors tailor the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great summary of the tradeoffs when using these frameworks!

how they have adapted models and tools specifically for embedded and
edge deployment. We will compare programming models, supported hardware,
optimization capabilities, and more to fully understand how frameworks
enable scalable machine learning from the cloud to the edge.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the introduction is very clear providing a comprehensive overview of the ML frameworks, utilities and key features + it sets a clear scope for the chapter, which i find very useful in general.

The only thing I feel (perhaps is too personal) it lacks is a specific mention of the challenges of running ML models on resource-constrained devices, which would be crucial for a TinyML-focused book. For example, in the first paragraph you introduce the TF, torch, and "specialized frameworks for
embedded devices". Why don't you just mention Lite/Lite Micro and more here?
You guys might also want to add a sentence about how the constraints of TinyML (power, memory, compute) necessitate different framework capabilities or optimizations. (After this, I know you mention it with TF Lite, Lite Micro, etc.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the interesting things that I've been discovering about our notes here. And that is that when you start thinking about ML systems, be it tiny or otherwise, you still need to know the same set of fundamentals before you specialize in the domain of interest, such as tiny.

various philosophies around graph execution, declarative versus
imperative APIs, and more. Declarative defines what the program should
do while imperative focuses on how it should do it step-by-step. For
instance, TensorFlow uses graph execution and declarative-style modeling
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful to provide more explanation on the difference between declarative and imperative APIs. Consider giving an example of the actual function of a declarative API vs an imperative API instead of just citing TensorFlow and Pytorch.

frameworks.qmd Outdated
stopping. These resources manage the complex aspects of performance,
enabling practitioners to zero in on model development and training. As
a result, developers experience both speed and ease when utilizing the
capabilities of neural networks.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful to have a diagram of batch processing here so the reader can visualize how batch processing works. For example: https://media.geeksforgeeks.org/wp-content/uploads/20221020142629/Batchoperatingsystem1-660x330.png

with it) are not supported with the TPU's. It also cannot support custom
custom operations from the machine learning frameworks, and the network
design must closely align to the hardware capabilities.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to add a more clear comparison between GPUs and TPUs here. What are the tradeoffs of using GPUs vs. TPUs and how does this change depending on use case?

easier to use, but are not as customizable as low-level frameworks (i.e.
users of low-level frameworks can define custom layers, loss functions,
optimization algorithms, etc.). Examples of high-level frameworks
include TensorFlow/Keras and PyTorch. Examples of low-level ML
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Pytorch not fall under "low-level frameworks" according to the above definition? It is possible to define custom layers, loss functions, optimizing algorithms in pytorch

There are many different techniques to compensate for this, such as
adding a proximal term to achieve a balance between the local and global
model, and adding a frozen [[global hypersphere
classifier]{.underline}](https://arxiv.org/abs/2207.09413).
Copy link
Contributor

@sjohri20 sjohri20 Oct 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following points can be added for challenges faced in federated learning.

Labeled Data is Not Always Naturally Available:

  • In many real-world scenarios, data collected on devices might not come with appropriate labels.
  • Additionally, users, being a source of data, can be unreliable, meaning that even if data is labeled, there's no guarantee of its accuracy or relevance.

Non-IID Data (covered already, but can be made clearer):

  • Each user's data is unique, leading to significant variance in the kind of data generated by different users.
  • The unbalanced nature of data, where some users produce more data than others can affect the global model's performance.

Massively Distributed:

  • The number of mobile device owners can be significantly larger than the average number of training samples on each device. This leads to a significant communication overhead.

Limited Communication:

  • Mobile networks, which are typically used for such communication, can be unstable. This instability can lead to delayed or failed transmission of model updates, affecting the overall training process.

Heterogeneous Device/Silo Resources:

  • Devices participating in FL can have varying computational powers and memory capacities, making it challenging to design algorithms that are efficient across all devices.

Privacy, Security, and Trust:

  • Malicious clients or servers can attempt to reverse-engineer model updates to infer information about local data.
  • With malicious clients or servers in the system, there's a risk of model poisoning or other types of adversarial attacks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's excellent feedback. Thanks for providing that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thanks!

@mpstewart1
Copy link
Contributor

@BaeHenryS @DivyaAmirtharaj @sophiacho1, and Emeka, great job on the frameworks chapter—the content is great! There are still several items that I need your group to resolve before I can merge the pull request:

  • Please provide links to all of the frameworks you mention, such as Pytorch, Tensorflow, CNTK, etc., when you first mention them.
  • There are some citations missing, notably one for ResNet and GPT-3 towards the start of the chapter. For references, please provide a full bibtex reference and not just a link to the resource (i.e., include authors, year of publication, if it as a website, the date the resource was accessed, and so on).
  • Please proofread the chapter, I still see some missing links such as items that say “We will discuss these concepts and details later on in Section XXX.”
  • Run ‘quarto render’ in your local repo to make sure that the figures match the column width of the book chapter, and also make sure that these figures (and also tables) have captions and are explicitly referred to in the text at least once.
  • If you use an acronym, please define in the chapter at the first opportunity. I see several occasions when TF is used but never defined—for us it is obvious from context but perhaps not for a non-expert.
  • Images should not be copied directly into a folder, as some of these images are copyrighted. Instead, please link to the original website image directly within the chapter. Because of this, there is no need to provide an explicit source below the image. Also, please remove lines such as “Source: https://medium.com/mlait/tensors-representation-of-data-in-neural-networks-bbe8a711b93b” and link to these in the figure caption as bibtex references.
  • Where code is referenced, please enclose these within double backticks (``) to make it easier for readers to see and interpret.
  • Please ensure that all headings are labeled, including subheadings—there are some occasions such as “Data Augmentation” and “Data Loaders” that are not referenced as headings.
  • Section 7.9 appears to be completely missing, so I am not sure if this is supposed to be there or not.
  • There are some erroneous heading labels for sections 7.5, 7.7, 7,8, and 7.10 (e.g., 7.10.0.1, 7.5.4.0.2) that need to be fixed.
  • The table in section 7.4.2.2 appears to be missing content.

Once these items have been resolved, I will rebase and merge the pull request.

@mpstewart1 mpstewart1 requested review from mpstewart1 and removed request for mpstewart1 October 18, 2023 17:57
@mpstewart1 mpstewart1 self-assigned this Oct 18, 2023
@mpstewart1 mpstewart1 added new new course content improvement Improve existing content labels Oct 18, 2023
@mpstewart1 mpstewart1 added the website updates to the web presentation label Oct 18, 2023
@profvjreddi profvjreddi merged commit df08976 into harvard-edge:main Oct 24, 2023
1 check passed
@uchendui uchendui added the cs249r label Nov 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improve existing content new new course content website updates to the web presentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.