Skip to content

Commit

Permalink
Prepare L24 flip note
Browse files Browse the repository at this point in the history
  • Loading branch information
h365chen committed Mar 8, 2024
1 parent 851ad81 commit 6fdab39
Showing 1 changed file with 72 additions and 0 deletions.
72 changes: 72 additions & 0 deletions lectures/flipped/L24.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# Lecture 24 — Large Language Models

## Roadmap

We will try the OpenAI website and some Huggine Face demos to see how powerful
current AI is, and then we will try two Hugging Face tutorials to see how model
performance can be optimized.

## Let's try them

Activity: let's try if GPT-3.5 or GPT-4 can fix Rust code. Check out some Rust
questions here: <https://practice.course.rs>

Activity: let's try image generation, say "please draw me a cute kitten" using
DALL·E, and also stable diffusion
<https://huggingface.co/spaces/stabilityai/stable-diffusion>

Activity: we can also try if GPT-3.5 or GPT-4 can decipher passwords if we give
it the rainbow table that we created in L23.

## Hugging Face Hub

It works as a central place (like GitHub, but for AI) where anyone can explore,
experiment, collaborate, and build technology with Machine Learning. In addition
to host code like GitHub, it also put some restrictions on format, interface,
etc. so that people working on different things can collaborate conveniently.
Currently it has four big categories: Repositories, Models, Datasets, Spaces.

### Natural Language Processing (NLP)

A simple NLP task is translation. For example, given a sequence of words as the
input, you would like to get a sequence of words in a different language as the
output. Before Transformers, the words in the output sequence can only be
generated one at a time, where the process cannot be executed in parallel well.
However, Transformer architecture allows them to be generated in parallel.

There are three main groups of optimizations.

- Tensor Contractions: batched matrix-matrix multiplications. Most
compute-intensive;
- Statistical Normalizations: softmax and layer normalization. Less
compute-intensive;
- Element-wise Operators: remaining operators such as activations. Least
compute-intensive

And two targets:

- The second is how can we train the model efficiently.
- how to make the model give answers or predictions

Let's try some techniques (the code in `live-coding/L24` was tested on
ecetesla0, but it takes time to run so may be just refer to the lecture note):

(See more info on Back Propogation
<https://www.3blue1brown.com/topics/neural-networks>)

- Batch size choice ([Original
tutorial](<https://huggingface.co/docs/transformers/v4.38.2/en/model_memory_anatomy>))
- Gradient Accumulation ([Original
tutorial](<https://huggingface.co/docs/transformers/main/training#prepare-a-dataset>))

For broader ideas:
<https://huggingface.co/docs/transformers/perf_train_gpu_one#batch-size-choice>

## Other

Worth to try: [NLP From Scratch: Translation with a Sequence to Sequence Network
and
Attention](<https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html>)

Also
<https://docs.nvidia.com/deeplearning/performance/dl-performance-fully-connected/index.html#case-studies>

0 comments on commit 6fdab39

Please sign in to comment.