Skip to content

Commit

Permalink
Merge branch 'main' into fix-module-2
Browse files Browse the repository at this point in the history
  • Loading branch information
Knight7561 authored Dec 22, 2024
2 parents df2dc68 + 67f13c6 commit f19444f
Show file tree
Hide file tree
Showing 12 changed files with 649 additions and 8 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -301,6 +301,13 @@
" train_dataset=dataset,\n",
" # Tokenizer for processing inputs\n",
" processing_class=tokenizer,\n",
" # DPO-specific temperature parameter that controls the strength of the preference model\n",
" # Lower values (like 0.1) make the model more conservative in following preferences\n",
" #beta=0.1,\n",
" # Maximum length of the input prompt in tokens\n",
" #max_prompt_length=1024,\n",
" # Maximum combined length of prompt + response in tokens\n",
" #max_length=1536,\n",
")"
]
},
Expand Down Expand Up @@ -357,7 +364,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.10"
"version": "3.12.7"
},
"widgets": {
"application/vnd.jupyter.widget-state+json": {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
"This notebook will guide you through the process of fine-tuning a language model using Odds Ratio Preference Optimization (ORPO). We will use the SmolLM2-135M model which has **not** been through SFT training, so it is not compatible with DPO. This means, you cannot use the model you trained in [1_instruction_tuning](../../1_instruction_tuning/notebooks/sft_finetuning_example.ipynb).\n",
"\n",
"<div style='background-color: lightblue; padding: 10px; border-radius: 5px; margin-bottom: 20px; color:black'>\n",
" <h2 style='margin: 0;color:blue'>Exercise: Aligning SmolLM2 with DPOTrainer</h2>\n",
" <h2 style='margin: 0;color:blue'>Exercise: Aligning SmolLM2 with ORPOTrainer</h2>\n",
" <p>Take a dataset from the Hugging Face hub and align a model on it. </p> \n",
" <p><b>Difficulty Levels</b></p>\n",
" <p>🐒 Use the `trl-lib/ultrafeedback_binarized` dataset</p>\n",
Expand Down Expand Up @@ -271,7 +271,7 @@
"model, tokenizer = setup_chat_format(model, tokenizer)\n",
"\n",
"# Set our name for the finetune to be saved &/ uploaded to\n",
"finetune_name = \"SmolLM2-FT-DPO\"\n",
"finetune_name = \"SmolLM2-FT-ORPO\"\n",
"finetune_tags = [\"smol-course\", \"module_1\"]"
]
},
Expand Down
2 changes: 1 addition & 1 deletion 2_preference_alignment/orpo.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ Key parameters to consider:

## Next Steps

⏩ Try the [ORPO Tutorial](./notebooks/orpo_tutorial.ipynb) to implement this unified approach to preference alignment.
⏩ Try the [ORPO Tutorial](./notebooks/orpo_finetuning_example.ipynb) to implement this unified approach to preference alignment.

## Resources
- [ORPO Paper](https://arxiv.org/abs/2403.07691)
Expand Down
4 changes: 2 additions & 2 deletions 5_vision_language_models/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,8 @@ For detailed guidance on fine-tuning VLMs, visit the [VLM Fine-Tuning](./vlm_fin

| Title | Description | Exercise | Link | Colab |
|-------|-------------|----------|------|-------|
| VLM Usage | Learn how to load and use a pre-trained VLM for various tasks | 🐒 Process an image<br>πŸ• Process multiple images with batch handling <br>🦁 Process a full video| [Notebook](./notebooks/vlm_usage_sample.ipynb) | <a target="_blank" href="https://colab.research.google.com/github/user/project/vlm_usage_sample.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a> |
| VLM Fine-Tuning | Learn how to fine-tune a pre-trained VLM for task-specific datasets | 🐒 Use a basic dataset for fine-tuning<br>πŸ• Try a new dataset<br>🦁 Experiment with alternative fine-tuning methods | [Notebook](./notebooks/vlm_finetune_sample.ipynb)| <a target="_blank" href="https://colab.research.google.com/github/user/project/vlm_finetune_sample.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a> |
| VLM Usage | Learn how to load and use a pre-trained VLM for various tasks | 🐒 Process an image<br>πŸ• Process multiple images with batch handling <br>🦁 Process a full video| [Notebook](./notebooks/vlm_usage_sample.ipynb) | <a target="_blank" href="https://colab.research.google.com/github/huggingface/smol-course/blob/main/5_vision_language_models/notebooks/vlm_usage_sample.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a> |
| VLM Fine-Tuning | Learn how to fine-tune a pre-trained VLM for task-specific datasets | 🐒 Use a basic dataset for fine-tuning<br>πŸ• Try a new dataset<br>🦁 Experiment with alternative fine-tuning methods | [Notebook](./notebooks/vlm_sft_sample.ipynb)| <a target="_blank" href="https://colab.research.google.com/github/huggingface/smol-course/blob/main/5_vision_language_models/notebooks/vlm_sft_sample.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a> |


## References
Expand Down
38 changes: 37 additions & 1 deletion 6_synthetic_datasets/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,39 @@
# Synthetic Datasets

I'm still working on this section...
Synthetic data is artificially generated data that mimics real-world usage. It allows overcoming data limitations by expanding or enhancing datasets. Even though synthetic data was already used for some usescases, large language models have made synthetic datasets more popular for pre- and post-training, and the evaluation of language models.

We'll use [`distilabel`](https://distilabel.argilla.io/latest/), a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers. For a deeper dive into the package and best practices, check out the [documentation](https://distilabel.argilla.io/latest/).

## Module Overview

Synthetic data for language models can be categorized into three taxonomies: instructions, preferences and critiques. We will focus on the first two categories, which focus on the generation of datasets for instruction tuning and preference alignment. In both categories, we will cover aspects of the third category, which focuses on improving existing data with model critiques and rewrites.

![Synthetic Data Taxonomies](./images/taxonomy-synthetic-data.png)

## Contents

### 1. [Instruction Datasets](./instruction_datasets.md)

Learn how to generate instruction datasets for instruction tuning. We will explore creating instruction tuning datasets thorugh basic prompting and using prompts more refined techniques from papers. Instruction tuning datasets with seed data for in-context learning can be created through methods like SelfInstruct and Magpie. Additionally, we will explore instruction evolution through EvolInstruct. [Start learning](./instruction_datasets.md).

### 2. [Preference Datasets](./preference_datasets.md)

Learn how to generate preference datasets for preference alignment. We will build on top of the methods and techniques introduced in section 1, by generating additional responses. Next, we will learn how to improve such responses with the EvolQuality prompt. Finally, we will explore how to evaluate responses with the the UltraFeedback prompt which will produce a score and critique, allowing us to create preference pairs. [Start learning](./preference_datasets.md).

### Exercise Notebooks

| Title | Description | Exercise | Link | Colab |
|-------|-------------|----------|------|-------|
| Instruction Dataset | Generate a dataset for instruction tuning | 🐒 Generate an instruction tuning dataset <br> πŸ• Generate a dataset for instruction tuning with seed data <br> 🦁 Generate a dataset for instruction tuning with seed data and with instruction evolution | [Link](./notebooks/instruction_sft_dataset.ipynb) | [Colab](https://githubtocolab.com/huggingface/smol-course/tree/main/6_synthetic_datasets/notebooks/instruction_sft_dataset.ipynb) |
| Preference Dataset | Generate a dataset for preference alignment | 🐒 Generate a preference alignment dataset <br> πŸ• Generate a preference alignment dataset with response evolution <br> 🦁 Generate a preference alignment dataset with response evolution and critiques | [Link](./notebooks/preference_alignment_dataset.ipynb) | [Colab](https://githubtocolab.com/huggingface/smol-course/tree/main/6_synthetic_datasets/notebooks/preference_alignment_dataset.ipynb) |

## Resources

- [Distilabel Documentation](https://distilabel.argilla.io/latest/)
- [Synthetic Data Generator is UI app](https://huggingface.co/blog/synthetic-data-generator)
- [SmolTalk](https://huggingface.co/datasets/HuggingFaceTB/smoltalk)
- [Self-instruct](https://arxiv.org/abs/2212.10560)
- [Evol-Instruct](https://arxiv.org/abs/2304.12244)
- [Magpie](https://arxiv.org/abs/2406.08464)
- [UltraFeedback](https://arxiv.org/abs/2310.01377)
- [Deita](https://arxiv.org/abs/2312.15685)
Binary file added 6_synthetic_datasets/images/pipeline.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit f19444f

Please sign in to comment.