Skip to content

Commit

Permalink
Merge branch 'main' into adithyare/dpo_data_refac
Browse files Browse the repository at this point in the history
  • Loading branch information
arendu authored Nov 15, 2024
2 parents 8ef4bfd + f82bbf4 commit b2951e8
Show file tree
Hide file tree
Showing 13 changed files with 74 additions and 2 deletions.
5 changes: 5 additions & 0 deletions docs/user-guide/cai.rst
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,11 @@ This section is a step-by-step tutorial that walks you through how to run a full

7. Run inference.

.. note::
Before starting this tutorial, be sure to review the :ref:`introduction <model-aligner-intro>` for tips on setting up your NeMo-Aligner environment.

If you run into any problems, refer to NeMo's `Known Issues page <https://docs.nvidia.com/nemo-framework/user-guide/latest/knownissues.html>`__. The page enumerates known issues and provides suggested workarounds where appropriate.

.. image:: ../assets/cai_flow.png

Step 1: Download models and datasets
Expand Down
2 changes: 2 additions & 0 deletions docs/user-guide/dpo.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ Model Alignment by DPO, RPO, and IPO

.. note::
Before starting this tutorial, be sure to review the :ref:`introduction <model-aligner-intro>` for tips on setting up your NeMo-Aligner environment.

If you run into any problems, refer to NeMo's `Known Issues page <https://docs.nvidia.com/nemo-framework/user-guide/latest/knownissues.html>`__. The page enumerates known issues and provides suggested workarounds where appropriate.

The NeMo Framework supports efficient model alignment via the NeMo-Aligner codebase.

Expand Down
2 changes: 2 additions & 0 deletions docs/user-guide/draftp.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ Fine-Tuning Stable Diffusion with DRaFT+
.. note::
Before starting this tutorial, be sure to review the :ref:`introduction <model-aligner-intro>` for tips on setting up your NeMo-Aligner environment.

If you run into any problems, refer to NeMo's `Known Issues page <https://docs.nvidia.com/nemo-framework/user-guide/latest/knownissues.html>`__. The page enumerates known issues and provides suggested workarounds where appropriate.

In this tutorial, we will go through the step-by-step guide for fine-tuning a Stable Diffusion model using DRaFT+ algorithm by NVIDIA.
DRaFT+ enhances the DRaFT `DRaFT <https://arxiv.org/pdf/2309.17400.pdf>`__ algorithm by mitigating mode collapse and improving diversity through regularization.
For more technical details on the DRaFT+ algorithm, check out our technical blog.
Expand Down
6 changes: 6 additions & 0 deletions docs/user-guide/knowledge-distillation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,12 @@ There are two primary benefits of knowledge distillation compared to standard su

There are many variants of knowledge distillation. NeMo Aligner supports training the student model to match the top-K logits of the teacher model. In this tutorial, we will go through fine-tuning a 2B student using a fine-tuned Nemotron 8B chat model.

.. note::
Before starting this tutorial, be sure to review the :ref:`introduction <model-aligner-intro>` for tips on setting up your NeMo-Aligner environment.

If you run into any problems, refer to NeMo's `Known Issues page <https://docs.nvidia.com/nemo-framework/user-guide/latest/knownissues.html>`__. The page enumerates known issues and provides suggested workarounds where appropriate.


Obtain the fine-tuned teacher and pre-trained student models
############################################################
To start, we must first download both the pre-trained student and fine-tuned teacher models
Expand Down
3 changes: 2 additions & 1 deletion docs/user-guide/modelalignment.rsts
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ To use a pre-built container, run the following code:
Please use the latest tag in the form yy.mm.(patch).

.. note::
Some of the subsequent tutorials require accessing gated Hugging Face models. For details on how to access these models, refer to ``this document <https://docs.nvidia.com/nemo-framework/user-guide//latest/generaltips.html#working-with-hugging-face-models>``__.
- Some of the subsequent tutorials require accessing gated Hugging Face models. For details on how to access these models, refer to `this document <https://docs.nvidia.com/nemo-framework/user-guide/latest/best-practices.html#working-with-hugging-face-models>`__.
- If you run into any problems, refer to NeMo's `Known Issues page <https://docs.nvidia.com/nemo-framework/user-guide/latest/knownissues.html>`__. The page enumerates known issues and provides suggested workarounds where appropriate.


2 changes: 2 additions & 0 deletions docs/user-guide/rlhf.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ Model Alignment by RLHF
.. note::
Before starting this tutorial, be sure to review the :ref:`introduction <model-aligner-intro>` for tips on setting up your NeMo-Aligner environment.

If you run into any problems, refer to NeMo's `Known Issues page <https://docs.nvidia.com/nemo-framework/user-guide/latest/knownissues.html>`__. The page enumerates known issues and provides suggested workarounds where appropriate.

For the purposes of this tutorial, we will go through the entire Reinforcement Learning from Human Feedback (RLHF) pipeline using models from the NeMo Framework. These models can include LLaMa or Mistral, and our scripts will function consistently across them.

RLHF is usually preceded by a Supervised Fine-Tuning (SFT). We should first follow the :ref:`Prerequisite guide <prerequisite>` and the :ref:`SFT guide <sft>`. After obtaining the SFT model, we will use this to start the RLHF process. We will use the `PPO <https://arxiv.org/abs/1707.06347>`__ algorithm for reinforcement learning on the `Anthropic-HH-RLHF <https://huggingface.co/datasets/Anthropic/hh-rlhf>`__ dataset.
Expand Down
2 changes: 2 additions & 0 deletions docs/user-guide/rs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ Model Alignment by Rejection Sampling
.. note::
Before starting this tutorial, be sure to review the :ref:`introduction <model-aligner-intro>` for tips on setting up your NeMo-Aligner environment.

If you run into any problems, refer to NeMo's `Known Issues page <https://docs.nvidia.com/nemo-framework/user-guide/latest/knownissues.html>`__. The page enumerates known issues and provides suggested workarounds where appropriate.

In this tutorial, we will guide you through the process of aligning a NeMo Framework model using rejection sampling. This method can be applied to various models, including LLaMa and Mistral, with our scripts functioning consistently across different models.

Rejection Sampling is usually preceded by a Supervised Fine-Tuning (SFT). We should first follow the :ref:`Prerequisite guide <prerequisite>` and the :ref:`SFT guide <sft>`. After obtaining the SFT model, we will also need to train a reward model as in :ref:`PPO guide <ppo>`. We will use the rejection sampling algorithm on the `Anthropic-HH-RLHF <https://huggingface.co/datasets/Anthropic/hh-rlhf>`__ dataset.
Expand Down
2 changes: 2 additions & 0 deletions docs/user-guide/sft.rst
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,8 @@ Model Alignment by Supervised Fine-Tuning (SFT)

.. note::
Before starting this tutorial, be sure to review the :ref:`introduction <model-aligner-intro>` for tips on setting up your NeMo-Aligner environment.

If you run into any problems, refer to NeMo's `Known Issues page <https://docs.nvidia.com/nemo-framework/user-guide/latest/knownissues.html>`__. The page enumerates known issues and provides suggested workarounds where appropriate.

Fine-Tune with a Prompt-Response Dataset
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Expand Down
2 changes: 2 additions & 0 deletions docs/user-guide/spin.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ For details on the SPIN algorithm, refer to the paper: `https://arxiv.org/abs/24

.. note::
Before starting this tutorial, be sure to review the :ref:`introduction <model-aligner-intro>` for tips on setting up your NeMo-Aligner environment.

If you run into any problems, refer to NeMo's `Known Issues page <https://docs.nvidia.com/nemo-framework/user-guide/latest/knownissues.html>`__. The page enumerates known issues and provides suggested workarounds where appropriate.

Obtain a Pretrained Model
#########################
Expand Down
2 changes: 2 additions & 0 deletions docs/user-guide/steerlm.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,8 @@ This section is a step-by-step tutorial that walks you through how to run a full

.. note::
Before starting this tutorial, be sure to review the :ref:`introduction <model-aligner-intro>` for tips on setting up your NeMo-Aligner environment.

If you run into any problems, refer to NeMo's `Known Issues page <https://docs.nvidia.com/nemo-framework/user-guide/latest/knownissues.html>`__. The page enumerates known issues and provides suggested workarounds where appropriate.

Download the Llama 2 LLM Model
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down
2 changes: 1 addition & 1 deletion examples/nlp/gpt/conf/gpt_dpo.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ model:
data_impl: jsonl
splits_string: null
seq_length: ${model.encoder_seq_length}
pad_length_to_multiple_of: null # Use if sequence_parallel is enabled to ensure seq_length is divisible by the ...
pad_length_to_multiple_of: null # If using sequence_parallel, ensure divisible by tensor_model_parallel_size
skip_warmup: True
num_workers: 0
reset_position_ids: False # Reset position ids after end-of-document token
Expand Down
1 change: 1 addition & 0 deletions nemo_aligner/data/nlp/datasets.py
Original file line number Diff line number Diff line change
Expand Up @@ -348,6 +348,7 @@ def encode(self, text, append_eod=False):

return text_ids, len(text_ids)


def _convert_messages(self, input_list): # TODO: (@adithyare) this method should live elsewhare..
output_dict = {
'system': '',
Expand Down
45 changes: 45 additions & 0 deletions tests/functional/run_all.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
#!/bin/bash

# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
cd $SCRIPT_DIR/test_cases

set -u

# Define ANSI color codes
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[0;33m'
NC='\033[0m' # No Color

for script in $(ls | grep -v '\.log$'); do
echo -n "[Running] $script..."

start_time=$(date +%s.%N)
output=$(bash "$script" 2>&1)
exit_code=$?
end_time=$(date +%s.%N)
elapsed=$(echo "$end_time $start_time" | awk '{print $1 - $2}')

if [[ $exit_code -eq 0 ]]; then
echo -e "${GREEN}PASSED${NC} (Time: ${elapsed}s)"
else
echo -e "${RED}FAILED${NC} (Time: ${elapsed}s)"
echo -e "${YELLOW}"
echo "$output" | tail -n 10
echo -e "${NC}"
fi
done

0 comments on commit b2951e8

Please sign in to comment.