Merge branch 'main' into fix-module-2

huggingface · Dec 22, 2024 · f19444f · f19444f
2 parents df2dc68 + 67f13c6
commit f19444f
Show file tree

Hide file tree

Showing 12 changed files with 649 additions and 8 deletions.
diff --git a/2_preference_alignment/notebooks/dpo_finetuning_example.ipynb b/2_preference_alignment/notebooks/dpo_finetuning_example.ipynb
@@ -301,6 +301,13 @@
     "    train_dataset=dataset,\n",
     "    # Tokenizer for processing inputs\n",
     "    processing_class=tokenizer,\n",
+    "    # DPO-specific temperature parameter that controls the strength of the preference model\n",
+    "    # Lower values (like 0.1) make the model more conservative in following preferences\n",
+    "    #beta=0.1,\n",
+    "    # Maximum length of the input prompt in tokens\n",
+    "    #max_prompt_length=1024,\n",
+    "    # Maximum combined length of prompt + response in tokens\n",
+    "    #max_length=1536,\n",
     ")"
    ]
   },
@@ -357,7 +364,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.11.10"
+   "version": "3.12.7"
   },
   "widgets": {
    "application/vnd.jupyter.widget-state+json": {

diff --git a/2_preference_alignment/notebooks/orpo_finetuning_example.ipynb b/2_preference_alignment/notebooks/orpo_finetuning_example.ipynb
@@ -9,7 +9,7 @@
     "This notebook will guide you through the process of fine-tuning a language model using Odds Ratio Preference Optimization (ORPO). We will use the SmolLM2-135M model which has **not** been through SFT training, so it is not compatible with DPO. This means, you cannot use the model you trained in [1_instruction_tuning](../../1_instruction_tuning/notebooks/sft_finetuning_example.ipynb).\n",
     "\n",
     "<div style='background-color: lightblue; padding: 10px; border-radius: 5px; margin-bottom: 20px; color:black'>\n",
-    "     <h2 style='margin: 0;color:blue'>Exercise: Aligning SmolLM2 with DPOTrainer</h2>\n",
+    "     <h2 style='margin: 0;color:blue'>Exercise: Aligning SmolLM2 with ORPOTrainer</h2>\n",
     "     <p>Take a dataset from the Hugging Face hub and align a model on it. </p> \n",
     "     <p><b>Difficulty Levels</b></p>\n",
     "     <p>🐢 Use the `trl-lib/ultrafeedback_binarized` dataset</p>\n",
@@ -271,7 +271,7 @@
     "model, tokenizer = setup_chat_format(model, tokenizer)\n",
     "\n",
     "# Set our name for the finetune to be saved &/ uploaded to\n",
-    "finetune_name = \"SmolLM2-FT-DPO\"\n",
+    "finetune_name = \"SmolLM2-FT-ORPO\"\n",
     "finetune_tags = [\"smol-course\", \"module_1\"]"
    ]
   },

diff --git a/2_preference_alignment/orpo.md b/2_preference_alignment/orpo.md
@@ -76,7 +76,7 @@ Key parameters to consider:
 
 ## Next Steps
 
-⏩ Try the [ORPO Tutorial](./notebooks/orpo_tutorial.ipynb) to implement this unified approach to preference alignment.
+⏩ Try the [ORPO Tutorial](./notebooks/orpo_finetuning_example.ipynb) to implement this unified approach to preference alignment.
 
 ## Resources
 - [ORPO Paper](https://arxiv.org/abs/2403.07691)

diff --git a/5_vision_language_models/README.md b/5_vision_language_models/README.md
@@ -24,8 +24,8 @@ For detailed guidance on fine-tuning VLMs, visit the [VLM Fine-Tuning](./vlm_fin
 
 | Title | Description | Exercise | Link | Colab |
 |-------|-------------|----------|------|-------|
-| VLM Usage | Learn how to load and use a pre-trained VLM for various tasks | 🐢 Process an image<br>🐕 Process multiple images with batch handling <br>🦁 Process a full video| [Notebook](./notebooks/vlm_usage_sample.ipynb) | <a target="_blank" href="https://colab.research.google.com/github/user/project/vlm_usage_sample.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a> |
-| VLM Fine-Tuning | Learn how to fine-tune a pre-trained VLM for task-specific datasets | 🐢 Use a basic dataset for fine-tuning<br>🐕 Try a new dataset<br>🦁 Experiment with alternative fine-tuning methods | [Notebook](./notebooks/vlm_finetune_sample.ipynb)| <a target="_blank" href="https://colab.research.google.com/github/user/project/vlm_finetune_sample.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a> | 
+| VLM Usage | Learn how to load and use a pre-trained VLM for various tasks | 🐢 Process an image<br>🐕 Process multiple images with batch handling <br>🦁 Process a full video| [Notebook](./notebooks/vlm_usage_sample.ipynb) | <a target="_blank" href="https://colab.research.google.com/github/huggingface/smol-course/blob/main/5_vision_language_models/notebooks/vlm_usage_sample.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a> |
+| VLM Fine-Tuning | Learn how to fine-tune a pre-trained VLM for task-specific datasets | 🐢 Use a basic dataset for fine-tuning<br>🐕 Try a new dataset<br>🦁 Experiment with alternative fine-tuning methods | [Notebook](./notebooks/vlm_sft_sample.ipynb)| <a target="_blank" href="https://colab.research.google.com/github/huggingface/smol-course/blob/main/5_vision_language_models/notebooks/vlm_sft_sample.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a> | 
 
 
 ## References  

diff --git a/6_synthetic_datasets/README.md b/6_synthetic_datasets/README.md
@@ -1,3 +1,39 @@
 # Synthetic Datasets
 
-I'm still working on this section...
+Synthetic data is artificially generated data that mimics real-world usage. It allows overcoming data limitations by expanding or enhancing datasets. Even though synthetic data was already used for some usescases, large language models have made synthetic datasets more popular for pre- and post-training, and the evaluation of language models.
+
+We'll use [`distilabel`](https://distilabel.argilla.io/latest/), a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers. For a deeper dive into the package and best practices, check out the [documentation](https://distilabel.argilla.io/latest/).
+
+## Module Overview
+
+Synthetic data for language models can be categorized into three taxonomies: instructions, preferences and critiques. We will focus on the first two categories, which focus on the generation of datasets for instruction tuning and preference alignment. In both categories, we will cover aspects of the third category, which focuses on improving existing data with model critiques and rewrites.
+
+![Synthetic Data Taxonomies](./images/taxonomy-synthetic-data.png)
+
+## Contents
+
+### 1. [Instruction Datasets](./instruction_datasets.md)
+
+Learn how to generate instruction datasets for instruction tuning. We will explore creating instruction tuning datasets thorugh basic prompting and using prompts more refined techniques from papers. Instruction tuning datasets with seed data for in-context learning can be created through methods like SelfInstruct and Magpie. Additionally, we will explore instruction evolution through EvolInstruct. [Start learning](./instruction_datasets.md).
+
+### 2. [Preference Datasets](./preference_datasets.md)
+
+Learn how to generate preference datasets for preference alignment. We will build on top of the methods and techniques introduced in section 1, by generating additional responses. Next, we will learn how to improve such responses with the EvolQuality prompt. Finally, we will explore how to evaluate responses with the the UltraFeedback prompt which will produce a score and critique, allowing us to create preference pairs. [Start learning](./preference_datasets.md).
+
+### Exercise Notebooks
+
+| Title | Description | Exercise | Link | Colab |
+|-------|-------------|----------|------|-------|
+| Instruction Dataset | Generate a dataset for instruction tuning | 🐢 Generate an instruction tuning dataset <br> 🐕 Generate a dataset for instruction tuning with seed data <br> 🦁 Generate a dataset for instruction tuning with seed data and with instruction evolution | [Link](./notebooks/instruction_sft_dataset.ipynb) | [Colab](https://githubtocolab.com/huggingface/smol-course/tree/main/6_synthetic_datasets/notebooks/instruction_sft_dataset.ipynb) |
+| Preference Dataset | Generate a dataset for preference alignment | 🐢 Generate a preference alignment dataset <br> 🐕 Generate a preference alignment dataset with response evolution <br> 🦁 Generate a preference alignment dataset with response evolution and critiques  | [Link](./notebooks/preference_alignment_dataset.ipynb) | [Colab](https://githubtocolab.com/huggingface/smol-course/tree/main/6_synthetic_datasets/notebooks/preference_alignment_dataset.ipynb) |
+
+## Resources
+
+- [Distilabel Documentation](https://distilabel.argilla.io/latest/)
+- [Synthetic Data Generator is UI app](https://huggingface.co/blog/synthetic-data-generator)
+- [SmolTalk](https://huggingface.co/datasets/HuggingFaceTB/smoltalk)
+- [Self-instruct](https://arxiv.org/abs/2212.10560)
+- [Evol-Instruct](https://arxiv.org/abs/2304.12244)
+- [Magpie](https://arxiv.org/abs/2406.08464)
+- [UltraFeedback](https://arxiv.org/abs/2310.01377)
+- [Deita](https://arxiv.org/abs/2312.15685)
diff --git a/6_synthetic_datasets/images/pipeline.png b/6_synthetic_datasets/images/pipeline.png
diff --git a/6_synthetic_datasets/images/taxonomy-synthetic-data.png b/6_synthetic_datasets/images/taxonomy-synthetic-data.png