NVIDIA · terrykong · Jul 16, 2024 · Jul 31, 2024 · Jul 23, 2024 · Aug 2, 2024
diff --git a/README.md b/README.md
@@ -7,15 +7,15 @@
 
 ## Introduction
 
-NeMo-Aligner is a scalable toolkit for efficient model alignment. The toolkit has support for state-of-the- art model alignment algorithms such as SteerLM, DPO, and Reinforcement Learning from Human Feedback (RLHF). These algorithms enable users to align language models to be more safe, harmless, and helpful. Users can perform end-to-end model alignment on a wide range of model sizes and take advantage of all the parallelism techniques to ensure their model alignment is done in a performant and resource-efficient manner. For more technical details, please refer to our [paper](https://arxiv.org/abs/2405.01481).
+NeMo-Aligner is a scalable toolkit for efficient model alignment. The toolkit has support for state-of-the-art model alignment algorithms such as SteerLM, DPO, and Reinforcement Learning from Human Feedback (RLHF). These algorithms enable users to align language models to be more safe, harmless, and helpful. Users can perform end-to-end model alignment on a wide range of model sizes and take advantage of all the parallelism techniques to ensure their model alignment is done in a performant and resource-efficient manner. For more technical details, please refer to our [paper](https://arxiv.org/abs/2405.01481).
 
 The NeMo-Aligner toolkit is built using the NeMo Framework, which enables scalable training across thousands of GPUs using tensor, data, and pipeline parallelism for all alignment components. Additionally, our checkpoints are cross-compatible with the NeMo ecosystem, facilitating inference deployment and further customization (https://github.com/NVIDIA/NeMo-Aligner).
 
 The toolkit is currently in it's early stages. We are committed to improving the toolkit to make it easier for developers to pick and choose different alignment algorithms to build safe, helpful, and reliable models.
 
 ## Key Features
 
-* **SteerLM: Attribute Conditioned SFT as an (User-Steerable) alternative to RLHF** 
+* **SteerLM: Attribute Conditioned SFT as an (User-Steerable) alternative to RLHF.** 
     * [Llama3-70B-SteerLM-Chat](https://huggingface.co/nvidia/Llama3-70B-SteerLM-Chat) aligned with NeMo-Aligner.
     * Corresponding reward model [Llama3-70B-SteerLM-RM](https://huggingface.co/nvidia/Llama3-70B-SteerLM-RM).
     * Learn more at our [SteerLM](https://arxiv.org/abs/2310.05344) and [HelpSteer2](https://arxiv.org/abs/2406.08673) papers.

diff --git a/docs/user-guide/cai.rst b/docs/user-guide/cai.rst
@@ -62,6 +62,11 @@ This section is a step-by-step tutorial that walks you through how to run a full
 
 7. Run inference.
 
+.. note::
+   Before starting this tutorial, be sure to review the :ref:`introduction <model-aligner-intro>` for tips on setting up your NeMo-Aligner environment.
+
+   If you run into any problems, refer to NeMo's `Known Issues page <https://docs.nvidia.com/nemo-framework/user-guide/latest/knownissues.html>`__. The page enumerates known issues and provides suggested workarounds where appropriate.
+
 .. image:: ../assets/cai_flow.png
 
 Step 1: Download models and datasets

diff --git a/docs/user-guide/dpo.rst b/docs/user-guide/dpo.rst
@@ -5,6 +5,11 @@
 Model Alignment by DPO, RPO, and IPO
 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
 
+.. note::
+   Before starting this tutorial, be sure to review the :ref:`introduction <model-aligner-intro>` for tips on setting up your NeMo-Aligner environment.
+
+   If you run into any problems, refer to NeMo's `Known Issues page <https://docs.nvidia.com/nemo-framework/user-guide/latest/knownissues.html>`__. The page enumerates known issues and provides suggested workarounds where appropriate.
+
 The NeMo Framework supports efficient model alignment via the NeMo-Aligner codebase.
 
 All algorithms in NeMo-Aligner will work with any GPT-based model that is from Megatron Core (in the config it has ``mcore_gpt=True``). For the purposes of this tutorial, we will go through the entire Direct Preference Optimization (DPO) pipeline using the newly released `2B GPT model with 4096 sequence length <https://huggingface.co/nvidia/GPT-2B-001>`__.  The same tutorial also works for GPT models (such as LLaMa2) of any size.

diff --git a/docs/user-guide/draftp.rst b/docs/user-guide/draftp.rst
@@ -5,6 +5,11 @@
 Fine-tuning Stable Diffusion with DRaFT+
 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
 
+.. note::
+   Before starting this tutorial, be sure to review the :ref:`introduction <model-aligner-intro>` for tips on setting up your NeMo-Aligner environment.
+
+   If you run into any problems, refer to NeMo's `Known Issues page <https://docs.nvidia.com/nemo-framework/user-guide/latest/knownissues.html>`__. The page enumerates known issues and provides suggested workarounds where appropriate.
+
 In this tutorial, we will go through the step-by-step guide for fine-tuning Stable Diffusion model using DRaFT+ algorithm by NVIDIA. 
 DRaFT+ is an improvement over the `DRaFT <https://arxiv.org/pdf/2309.17400.pdf>`__ algorithm by alleviating the mode collapse and improving diversity through regularization. 
 For more technical details on the DRaFT+ algorithm, check out our technical blog.

diff --git a/docs/user-guide/modelalignment.rsts b/docs/user-guide/modelalignment.rsts
@@ -1,2 +1,32 @@
 Model Alignment
 !!!!!!!!!!!!!!!
+
+Introduction
+############
+
+NeMo-Aligner is a scalable toolkit for efficient model alignment. The toolkit has support for state-of-the-art model alignment algorithms such as SteerLM, Direct Preference Optimization (DPO), and Reinforcement Learning from Human Feedback (RLHF). These algorithms enable users to align language models to be more safe, harmless, and helpful. Users can perform end-to-end model alignment on a wide range of model sizes and take advantage of all the parallelism techniques to ensure their model alignment is done in a performant and resource-efficient manner. For more technical details, please refer to our `paper <https://arxiv.org/abs/2405.01481>`__.
+
+The NeMo-Aligner toolkit is built using the `NeMo Toolkit <https://github.com/NVIDIA/NeMo>`__ which allows for scaling training up to 1000s of GPUs using tensor, data and pipeline parallelism for all components of alignment. All of our checkpoints are cross-compatible with the NeMo ecosystem, allowing for inference deployment and further customization.
+
+The toolkit is currently in its early stages. We are committed to improving the toolkit to make it easier for developers to pick and choose different alignment algorithms to build safe, helpful, and reliable models.
+
+Get Started
+###########
+
+NeMo-Aligner comes preinstalled in NVIDIA NeMo containers. NeMo containers are launched concurrently with NeMo version updates.
+
+To get access to the container, log in to the NVIDIA GPU Cloud (NGC) platform or create a free NGC account here: `NVIDIA NGC <https://ngc.nvidia.com/signin>`__. Once you have logged in, you can get the container here: `NVIDIA NGC NeMo Framework <https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo>`__.
+
+To use a pre-built container, run the following code:
+
+    .. code-block:: bash
+
+        docker run -it --gpus=all  --shm-size=8g --workdir /opt/NeMo-Aligner nvcr.io/nvidia/nemo:24.09
+
+    Please use the latest tag in the form yy.mm.(patch).
+
+.. note::
+   - Some of the subsequent tutorials require accessing gated Hugging Face models. For details on how to access these models, refer to `this document <https://docs.nvidia.com/nemo-framework/user-guide/latest/best-practices.html#working-with-hugging-face-models>`__.
+   - If you run into any problems, refer to NeMo's `Known Issues page <https://docs.nvidia.com/nemo-framework/user-guide/latest/knownissues.html>`__. The page enumerates known issues and provides suggested workarounds where appropriate.
+
+
diff --git a/docs/user-guide/rlhf.rst b/docs/user-guide/rlhf.rst
@@ -5,7 +5,12 @@
 Model Alignment by RLHF
 @@@@@@@@@@@@@@@@@@@@@@@
 
-For the purposes of this tutorial, we will go through the entire Reinforcement Learning from Human Feedback (RLHF) pipeline using models from the NeMo Framework. These models can include LLaMa2 or Mistral, and our scripts will function consistently across them.
+.. note::
+   Before starting this tutorial, be sure to review the :ref:`introduction <model-aligner-intro>` for tips on setting up your NeMo-Aligner environment.
+
+   If you run into any problems, refer to NeMo's `Known Issues page <https://docs.nvidia.com/nemo-framework/user-guide/latest/knownissues.html>`__. The page enumerates known issues and provides suggested workarounds where appropriate.
+
+For the purposes of this tutorial, we will go through the entire Reinforcement Learning from Human Feedback (RLHF) pipeline using models from the NeMo Framework. These models can include LLaMa or Mistral, and our scripts will function consistently across them.
 
 RLHF is usually preceded by a Supervised Fine-Tuning (SFT). We should first follow the :ref:`Prerequisite guide <prerequisite>` and the :ref:`SFT guide <sft>`. After obtaining the SFT model, we will use this to start the RLHF process. We will use the `PPO <https://arxiv.org/abs/1707.06347>`__ algorithm for reinforcement learning on the `Anthropic-HH-RLHF <https://huggingface.co/datasets/Anthropic/hh-rlhf>`__ dataset.
 

diff --git a/docs/user-guide/rs.rst b/docs/user-guide/rs.rst
@@ -5,7 +5,12 @@
 Model Alignment by Rejection Sampling
 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
 
-In this tutorial, we will guide you through the process of aligning a NeMo Framework model using rejection sampling. This method can be applied to various models, including LLaMa2 and Mistral, with our scripts functioning consistently across different models.
+.. note::
+   Before starting this tutorial, be sure to review the :ref:`introduction <model-aligner-intro>` for tips on setting up your NeMo-Aligner environment.
+
+   If you run into any problems, refer to NeMo's `Known Issues page <https://docs.nvidia.com/nemo-framework/user-guide/latest/knownissues.html>`__. The page enumerates known issues and provides suggested workarounds where appropriate.
+
+In this tutorial, we will guide you through the process of aligning a NeMo Framework model using rejection sampling. This method can be applied to various models, including LLaMa and Mistral, with our scripts functioning consistently across different models.
 
 Rejection Sampling is usually preceded by a Supervised Fine-Tuning (SFT). We should first follow the :ref:`Prerequisite guide <prerequisite>` and the :ref:`SFT guide <sft>`. After obtaining the SFT model, we will also need to train a reward model as in :ref:`PPO guide <ppo>`. We will use the rejection sampling algorithm on the `Anthropic-HH-RLHF <https://huggingface.co/datasets/Anthropic/hh-rlhf>`__ dataset.
 

diff --git a/docs/user-guide/sft.rst b/docs/user-guide/sft.rst
@@ -65,6 +65,11 @@ Model Alignment by Supervised Fine-Tuning (SFT)
 
 2. **Chat**. In the *Chat* format, each example contains a multi-turn conversation between different roles (e.g., *User* and *Assistant*). Fine-tuning the base model on a chat format dataset is useful to align a chatbot.
 
+.. note::
+   Before starting this tutorial, be sure to review the :ref:`introduction <model-aligner-intro>` for tips on setting up your NeMo-Aligner environment.
+
+   If you run into any problems, refer to NeMo's `Known Issues page <https://docs.nvidia.com/nemo-framework/user-guide/latest/knownissues.html>`__. The page enumerates known issues and provides suggested workarounds where appropriate.
+
 Fine-Tune with a Prompt-Response Dataset
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 

diff --git a/docs/user-guide/spin.rst b/docs/user-guide/spin.rst
@@ -9,9 +9,14 @@ The NeMo framework supports efficient model alignment via the NeMo Aligner codeb
 
 All algorithms in NeMo Aligner will work with any GPT based model that is from mcore(i.e in the config it has ``mcore_gpt=True``). For the purposes of this tutorial, we will go through the entire SPIN pipeline using the newly released `2B GPT model with 4096 sequence length <https://huggingface.co/nvidia/GPT-2B-001>`__.  This same tutorial also works for GPT models(such as LLaMa2) of any size.
 
-Obtaining a pretrained model
-############################
-To start, we must first get a pretrained model to align. There are 2 models we recommend to get started. The rest of the tutorial will work with either model, but for demonstration purposes we will use the smaller 2B model. 
+.. note::
+   Before starting this tutorial, be sure to review the :ref:`introduction <model-aligner-intro>` for tips on setting up your NeMo-Aligner environment.
+
+   If you run into any problems, refer to NeMo's `Known Issues page <https://docs.nvidia.com/nemo-framework/user-guide/latest/knownissues.html>`__. The page enumerates known issues and provides suggested workarounds where appropriate.
+
+Obtain a Pretrained Model
+#########################
+To start, we must first get a pretrained model to align. There are two models we recommend to get started. The rest of the tutorial will work with either model, but for demonstration purposes, we will use the smaller 2B model.
 
 .. tab-set::
 

diff --git a/docs/user-guide/steerlm.rst b/docs/user-guide/steerlm.rst
@@ -42,6 +42,11 @@ Train a SteerLM model
 
 This section is a step-by-step tutorial that walks you through how to run a full SteerLM pipeline with a Llama2 70B LLM model. It includes the following:
 
+.. note::
+   Before starting this tutorial, be sure to review the :ref:`introduction <model-aligner-intro>` for tips on setting up your NeMo-Aligner environment.
+
+   If you run into any problems, refer to NeMo's `Known Issues page <https://docs.nvidia.com/nemo-framework/user-guide/latest/knownissues.html>`__. The page enumerates known issues and provides suggested workarounds where appropriate.
+
 1. Data download and preprocessing
 
 2. Training the attribute prediction model (aka regression reward model)

diff --git a/examples/nlp/gpt/conf/inference_rm.yaml b/examples/nlp/gpt/conf/inference_rm.yaml
@@ -49,3 +49,7 @@ model:
   regression:
     merge_attributes: False # whether to merge attribute values into a scalar
     attribute_weights: null # apply these weights to each attributes when merging them into a scalar
+
+  # NOTE: The user does not need to change the global batch size below
+  # GBS is overridden to 0 to disable checks for compatibility with the megatron-core parallel state
+  global_batch_size: 0
diff --git a/examples/nlp/gpt/train_gpt_sft.py b/examples/nlp/gpt/train_gpt_sft.py
@@ -102,6 +102,9 @@ def _modify_config(gpt_cfg, cfg, add_cfg_to_tree=False):
         if cfg.model.get("seq_len_interpolation_factor", None) is not None:
             gpt_cfg.seq_len_interpolation_factor = cfg.model.seq_len_interpolation_factor
 
+        if cfg.model.get("dist_ckpt_load_strictness", None) is not None:
+            gpt_cfg.dist_ckpt_load_strictness = cfg.model.dist_ckpt_load_strictness
+
         gpt_cfg.inference = cfg.model.get("inference", {})
 
         # This is needed when modifying a hparam file directly to load `.ckpt` files.

diff --git a/examples/nlp/gpt/train_steerlm2.py b/examples/nlp/gpt/train_steerlm2.py
@@ -140,6 +140,9 @@ def _modify_config(gpt_cfg, cfg, add_cfg_to_tree=False):
         if cfg.model.get("seq_len_interpolation_factor", None) is not None:
             gpt_cfg.seq_len_interpolation_factor = cfg.model.seq_len_interpolation_factor
 
+        if cfg.model.get("dist_ckpt_load_strictness", None) is not None:
+            gpt_cfg.dist_ckpt_load_strictness = cfg.model.dist_ckpt_load_strictness
+
         gpt_cfg.inference = cfg.model.get("inference", {})
 
         # This is needed when modifying a hparam file directly to load `.ckpt` files.

diff --git a/nemo_aligner/models/nlp/gpt/megatron_gpt_critic.py b/nemo_aligner/models/nlp/gpt/megatron_gpt_critic.py
@@ -15,7 +15,7 @@
 from enum import Enum
 
 import torch
-from megatron.core.num_microbatches_calculator import get_num_microbatches, reconfigure_microbatch_calculator
+from megatron.core.num_microbatches_calculator import get_num_microbatches, reconfigure_num_microbatches_calculator
 from megatron.core.pipeline_parallel.schedules import get_forward_backward_func
 from megatron.core.transformer.module import Float16Module
 from omegaconf.dictconfig import DictConfig
@@ -73,7 +73,7 @@ def prepare_for_inference(self):
 
     def prepare_for_training(self):
         app_state = AppState()
-        reconfigure_microbatch_calculator(
+        reconfigure_num_microbatches_calculator(
             rank=app_state.global_rank,
             rampup_batch_size=None,
             global_batch_size=self.cfg.global_batch_size,

diff --git a/nemo_aligner/models/nlp/gpt/megatron_gpt_reward_model.py b/nemo_aligner/models/nlp/gpt/megatron_gpt_reward_model.py
@@ -92,7 +92,7 @@ def model_provider_func(self, pre_process, post_process):
 
         model = GPTRewardModel(
             config=self.transformer_config,
-            transformer_layer_spec=get_specs(self.spec_name, self.transformer_config.num_moe_experts),
+            transformer_layer_spec=get_specs(self.spec_name, self.transformer_config),
             vocab_size=self.cfg.get("override_vocab_size", self.padded_vocab_size),
             max_sequence_length=self.cfg.get("encoder_seq_length", 512),
             pre_process=pre_process,

diff --git a/nemo_aligner/package_info.py b/nemo_aligner/package_info.py
@@ -14,9 +14,9 @@
 
 
 MAJOR = 0
-MINOR = 6
+MINOR = 5
 PATCH = 0
-PRE_RELEASE = "rc0"
+PRE_RELEASE = ""
 
 # Use the following formatting: (major, minor, patch, pre-release)
 VERSION = (MAJOR, MINOR, PATCH, PRE_RELEASE)

diff --git a/nemo_aligner/utils/utils.py b/nemo_aligner/utils/utils.py
@@ -28,7 +28,7 @@
 
 import torch
 from megatron.core.dist_checkpointing.mapping import ShardedObject, ShardedTensorFactory
-from megatron.core.num_microbatches_calculator import reconfigure_microbatch_calculator
+from megatron.core.num_microbatches_calculator import reconfigure_num_microbatches_calculator
 from omegaconf import DictConfig, OmegaConf
 from torch.masked import as_masked_tensor
 
@@ -122,7 +122,9 @@ def load_checkpoint_model_config(restore_path):
         return OmegaConf.load(cfg_path)
 
     with tempfile.TemporaryDirectory() as tmpdir:
-        NLPSaveRestoreConnector._unpack_nemo_file(restore_path, tmpdir, extract_config_only=True)
+        # Extracts only model config
+        members = NLPSaveRestoreConnector._filtered_tar_info(restore_path, filter_fn=lambda name: ".yaml" in name)
+        NLPSaveRestoreConnector._unpack_nemo_file(restore_path, tmpdir, members=members)
         cfg = OmegaConf.load(os.path.join(tmpdir, config_name_in_ckpt))
 
     return cfg
@@ -229,7 +231,7 @@ def calculate_response_lengths(tokens, eos_id):
 
 def configure_batch_sizes(mbs, gbs, dp=1):
     app_state = AppState()
-    reconfigure_microbatch_calculator(
+    reconfigure_num_microbatches_calculator(
         rank=app_state.global_rank,
         rampup_batch_size=None,
         global_batch_size=gbs,

diff --git a/setup/requirements.txt b/setup/requirements.txt
@@ -1,4 +1,4 @@
 jsonlines
-megatron_core==0.8
+megatron_core>=0.8
 nemo_toolkit[nlp]
 nvidia-pytriton