-
Notifications
You must be signed in to change notification settings - Fork 138
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
72 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,72 @@ | ||
# Lecture 24 — Large Language Models | ||
|
||
## Roadmap | ||
|
||
We will try the OpenAI website and some Huggine Face demos to see how powerful | ||
current AI is, and then we will try two Hugging Face tutorials to see how model | ||
performance can be optimized. | ||
|
||
## Let's try them | ||
|
||
Activity: let's try if GPT-3.5 or GPT-4 can fix Rust code. Check out some Rust | ||
questions here: <https://practice.course.rs> | ||
|
||
Activity: let's try image generation, say "please draw me a cute kitten" using | ||
DALL·E, and also stable diffusion | ||
<https://huggingface.co/spaces/stabilityai/stable-diffusion> | ||
|
||
Activity: we can also try if GPT-3.5 or GPT-4 can decipher passwords if we give | ||
it the rainbow table that we created in L23. | ||
|
||
## Hugging Face Hub | ||
|
||
It works as a central place (like GitHub, but for AI) where anyone can explore, | ||
experiment, collaborate, and build technology with Machine Learning. In addition | ||
to host code like GitHub, it also put some restrictions on format, interface, | ||
etc. so that people working on different things can collaborate conveniently. | ||
Currently it has four big categories: Repositories, Models, Datasets, Spaces. | ||
|
||
### Natural Language Processing (NLP) | ||
|
||
A simple NLP task is translation. For example, given a sequence of words as the | ||
input, you would like to get a sequence of words in a different language as the | ||
output. Before Transformers, the words in the output sequence can only be | ||
generated one at a time, where the process cannot be executed in parallel well. | ||
However, Transformer architecture allows them to be generated in parallel. | ||
|
||
There are three main groups of optimizations. | ||
|
||
- Tensor Contractions: batched matrix-matrix multiplications. Most | ||
compute-intensive; | ||
- Statistical Normalizations: softmax and layer normalization. Less | ||
compute-intensive; | ||
- Element-wise Operators: remaining operators such as activations. Least | ||
compute-intensive | ||
|
||
And two targets: | ||
|
||
- The second is how can we train the model efficiently. | ||
- how to make the model give answers or predictions | ||
|
||
Let's try some techniques (the code in `live-coding/L24` was tested on | ||
ecetesla0, but it takes time to run so may be just refer to the lecture note): | ||
|
||
(See more info on Back Propogation | ||
<https://www.3blue1brown.com/topics/neural-networks>) | ||
|
||
- Batch size choice ([Original | ||
tutorial](<https://huggingface.co/docs/transformers/v4.38.2/en/model_memory_anatomy>)) | ||
- Gradient Accumulation ([Original | ||
tutorial](<https://huggingface.co/docs/transformers/main/training#prepare-a-dataset>)) | ||
|
||
For broader ideas: | ||
<https://huggingface.co/docs/transformers/perf_train_gpu_one#batch-size-choice> | ||
|
||
## Other | ||
|
||
Worth to try: [NLP From Scratch: Translation with a Sequence to Sequence Network | ||
and | ||
Attention](<https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html>) | ||
|
||
Also | ||
<https://docs.nvidia.com/deeplearning/performance/dl-performance-fully-connected/index.html#case-studies> |