Merge branch 'main' into adithyare/dpo_data_refac

NVIDIA · Nov 15, 2024 · b2951e8 · b2951e8
2 parents 8ef4bfd + f82bbf4
commit b2951e8
Show file tree

Hide file tree

Showing 13 changed files with 74 additions and 2 deletions.
diff --git a/docs/user-guide/cai.rst b/docs/user-guide/cai.rst
@@ -62,6 +62,11 @@ This section is a step-by-step tutorial that walks you through how to run a full
 
 7. Run inference.
 
+.. note::
+   Before starting this tutorial, be sure to review the :ref:`introduction <model-aligner-intro>` for tips on setting up your NeMo-Aligner environment.
+
+   If you run into any problems, refer to NeMo's `Known Issues page <https://docs.nvidia.com/nemo-framework/user-guide/latest/knownissues.html>`__. The page enumerates known issues and provides suggested workarounds where appropriate.
+
 .. image:: ../assets/cai_flow.png
 
 Step 1: Download models and datasets

diff --git a/docs/user-guide/dpo.rst b/docs/user-guide/dpo.rst
@@ -7,6 +7,8 @@ Model Alignment by DPO, RPO, and IPO
 
 .. note::
    Before starting this tutorial, be sure to review the :ref:`introduction <model-aligner-intro>` for tips on setting up your NeMo-Aligner environment.
+
+   If you run into any problems, refer to NeMo's `Known Issues page <https://docs.nvidia.com/nemo-framework/user-guide/latest/knownissues.html>`__. The page enumerates known issues and provides suggested workarounds where appropriate.
 
 The NeMo Framework supports efficient model alignment via the NeMo-Aligner codebase.
 

diff --git a/docs/user-guide/draftp.rst b/docs/user-guide/draftp.rst
@@ -8,6 +8,8 @@ Fine-Tuning Stable Diffusion with DRaFT+
 .. note::
    Before starting this tutorial, be sure to review the :ref:`introduction <model-aligner-intro>` for tips on setting up your NeMo-Aligner environment.
 
+   If you run into any problems, refer to NeMo's `Known Issues page <https://docs.nvidia.com/nemo-framework/user-guide/latest/knownissues.html>`__. The page enumerates known issues and provides suggested workarounds where appropriate.
+
 In this tutorial, we will go through the step-by-step guide for fine-tuning a Stable Diffusion model using DRaFT+ algorithm by NVIDIA.
 DRaFT+ enhances the DRaFT `DRaFT <https://arxiv.org/pdf/2309.17400.pdf>`__ algorithm by mitigating mode collapse and improving diversity through regularization.
 For more technical details on the DRaFT+ algorithm, check out our technical blog.

diff --git a/docs/user-guide/knowledge-distillation.rst b/docs/user-guide/knowledge-distillation.rst
@@ -9,6 +9,12 @@ There are two primary benefits of knowledge distillation compared to standard su
 
 There are many variants of knowledge distillation. NeMo Aligner supports training the student model to match the top-K logits of the teacher model. In this tutorial, we will go through fine-tuning a 2B student using a fine-tuned Nemotron 8B chat model.
 
+.. note::
+   Before starting this tutorial, be sure to review the :ref:`introduction <model-aligner-intro>` for tips on setting up your NeMo-Aligner environment.
+
+   If you run into any problems, refer to NeMo's `Known Issues page <https://docs.nvidia.com/nemo-framework/user-guide/latest/knownissues.html>`__. The page enumerates known issues and provides suggested workarounds where appropriate.
+
+
 Obtain the fine-tuned teacher and pre-trained student models
 ############################################################
 To start, we must first download both the pre-trained student and fine-tuned teacher models

diff --git a/docs/user-guide/modelalignment.rsts b/docs/user-guide/modelalignment.rsts
@@ -29,6 +29,7 @@ To use a pre-built container, run the following code:
     Please use the latest tag in the form yy.mm.(patch).
 
 .. note::
-   Some of the subsequent tutorials require accessing gated Hugging Face models. For details on how to access these models, refer to ``this document <https://docs.nvidia.com/nemo-framework/user-guide//latest/generaltips.html#working-with-hugging-face-models>``__.
+   - Some of the subsequent tutorials require accessing gated Hugging Face models. For details on how to access these models, refer to `this document <https://docs.nvidia.com/nemo-framework/user-guide/latest/best-practices.html#working-with-hugging-face-models>`__.
+   - If you run into any problems, refer to NeMo's `Known Issues page <https://docs.nvidia.com/nemo-framework/user-guide/latest/knownissues.html>`__. The page enumerates known issues and provides suggested workarounds where appropriate.
 
 
diff --git a/docs/user-guide/rlhf.rst b/docs/user-guide/rlhf.rst
@@ -8,6 +8,8 @@ Model Alignment by RLHF
 .. note::
    Before starting this tutorial, be sure to review the :ref:`introduction <model-aligner-intro>` for tips on setting up your NeMo-Aligner environment.
 
+   If you run into any problems, refer to NeMo's `Known Issues page <https://docs.nvidia.com/nemo-framework/user-guide/latest/knownissues.html>`__. The page enumerates known issues and provides suggested workarounds where appropriate.
+
 For the purposes of this tutorial, we will go through the entire Reinforcement Learning from Human Feedback (RLHF) pipeline using models from the NeMo Framework. These models can include LLaMa or Mistral, and our scripts will function consistently across them.
 
 RLHF is usually preceded by a Supervised Fine-Tuning (SFT). We should first follow the :ref:`Prerequisite guide <prerequisite>` and the :ref:`SFT guide <sft>`. After obtaining the SFT model, we will use this to start the RLHF process. We will use the `PPO <https://arxiv.org/abs/1707.06347>`__ algorithm for reinforcement learning on the `Anthropic-HH-RLHF <https://huggingface.co/datasets/Anthropic/hh-rlhf>`__ dataset.

diff --git a/docs/user-guide/rs.rst b/docs/user-guide/rs.rst
@@ -8,6 +8,8 @@ Model Alignment by Rejection Sampling
 .. note::
    Before starting this tutorial, be sure to review the :ref:`introduction <model-aligner-intro>` for tips on setting up your NeMo-Aligner environment.
 
+   If you run into any problems, refer to NeMo's `Known Issues page <https://docs.nvidia.com/nemo-framework/user-guide/latest/knownissues.html>`__. The page enumerates known issues and provides suggested workarounds where appropriate.
+
 In this tutorial, we will guide you through the process of aligning a NeMo Framework model using rejection sampling. This method can be applied to various models, including LLaMa and Mistral, with our scripts functioning consistently across different models.
 
 Rejection Sampling is usually preceded by a Supervised Fine-Tuning (SFT). We should first follow the :ref:`Prerequisite guide <prerequisite>` and the :ref:`SFT guide <sft>`. After obtaining the SFT model, we will also need to train a reward model as in :ref:`PPO guide <ppo>`. We will use the rejection sampling algorithm on the `Anthropic-HH-RLHF <https://huggingface.co/datasets/Anthropic/hh-rlhf>`__ dataset.

diff --git a/docs/user-guide/sft.rst b/docs/user-guide/sft.rst
@@ -71,6 +71,8 @@ Model Alignment by Supervised Fine-Tuning (SFT)
 
 .. note::
    Before starting this tutorial, be sure to review the :ref:`introduction <model-aligner-intro>` for tips on setting up your NeMo-Aligner environment.
+
+   If you run into any problems, refer to NeMo's `Known Issues page <https://docs.nvidia.com/nemo-framework/user-guide/latest/knownissues.html>`__. The page enumerates known issues and provides suggested workarounds where appropriate.
 
 Fine-Tune with a Prompt-Response Dataset
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

diff --git a/docs/user-guide/spin.rst b/docs/user-guide/spin.rst
@@ -11,6 +11,8 @@ For details on the SPIN algorithm, refer to the paper: `https://arxiv.org/abs/24
 
 .. note::
    Before starting this tutorial, be sure to review the :ref:`introduction <model-aligner-intro>` for tips on setting up your NeMo-Aligner environment.
+
+   If you run into any problems, refer to NeMo's `Known Issues page <https://docs.nvidia.com/nemo-framework/user-guide/latest/knownissues.html>`__. The page enumerates known issues and provides suggested workarounds where appropriate.
 
 Obtain a Pretrained Model
 #########################

diff --git a/docs/user-guide/steerlm.rst b/docs/user-guide/steerlm.rst
@@ -47,6 +47,8 @@ This section is a step-by-step tutorial that walks you through how to run a full
 
 .. note::
    Before starting this tutorial, be sure to review the :ref:`introduction <model-aligner-intro>` for tips on setting up your NeMo-Aligner environment.
+
+   If you run into any problems, refer to NeMo's `Known Issues page <https://docs.nvidia.com/nemo-framework/user-guide/latest/knownissues.html>`__. The page enumerates known issues and provides suggested workarounds where appropriate.
 
 Download the Llama 2 LLM Model
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

diff --git a/examples/nlp/gpt/conf/gpt_dpo.yaml b/examples/nlp/gpt/conf/gpt_dpo.yaml
@@ -115,7 +115,7 @@ model:
     data_impl: jsonl
     splits_string: null
     seq_length: ${model.encoder_seq_length}
-    pad_length_to_multiple_of: null  # Use if sequence_parallel is enabled to ensure seq_length is divisible by the ...
+    pad_length_to_multiple_of: null  # If using sequence_parallel, ensure divisible by tensor_model_parallel_size
     skip_warmup: True
     num_workers: 0
     reset_position_ids: False # Reset position ids after end-of-document token

diff --git a/nemo_aligner/data/nlp/datasets.py b/nemo_aligner/data/nlp/datasets.py
@@ -348,6 +348,7 @@ def encode(self, text, append_eod=False):
 
         return text_ids, len(text_ids)
 
+
     def _convert_messages(self, input_list):  # TODO: (@adithyare) this method should live elsewhare..
         output_dict = {
             'system': '',

diff --git a/tests/functional/run_all.sh b/tests/functional/run_all.sh
@@ -0,0 +1,45 @@
+#!/bin/bash
+
+# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
+cd $SCRIPT_DIR/test_cases
+
+set -u
+
+# Define ANSI color codes
+RED='\033[0;31m'
+GREEN='\033[0;32m'
+YELLOW='\033[0;33m'
+NC='\033[0m' # No Color
+
+for script in $(ls | grep -v '\.log$'); do
+  echo -n "[Running] $script..."
+
+  start_time=$(date +%s.%N)
+  output=$(bash "$script" 2>&1)
+  exit_code=$?
+  end_time=$(date +%s.%N)
+  elapsed=$(echo "$end_time $start_time" | awk '{print $1 - $2}')
+
+  if [[ $exit_code -eq 0 ]]; then
+    echo -e "${GREEN}PASSED${NC} (Time: ${elapsed}s)"
+  else
+    echo -e "${RED}FAILED${NC} (Time: ${elapsed}s)"
+    echo -e "${YELLOW}"
+    echo "$output" | tail -n 10
+    echo -e "${NC}"
+  fi
+done