Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cherry pick docs: add a link to NeMo's known issues page (402) into r0.5.0 #407

Closed
wants to merge 13 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,15 @@

## Introduction

NeMo-Aligner is a scalable toolkit for efficient model alignment. The toolkit has support for state-of-the- art model alignment algorithms such as SteerLM, DPO, and Reinforcement Learning from Human Feedback (RLHF). These algorithms enable users to align language models to be more safe, harmless, and helpful. Users can perform end-to-end model alignment on a wide range of model sizes and take advantage of all the parallelism techniques to ensure their model alignment is done in a performant and resource-efficient manner. For more technical details, please refer to our [paper](https://arxiv.org/abs/2405.01481).
NeMo-Aligner is a scalable toolkit for efficient model alignment. The toolkit has support for state-of-the-art model alignment algorithms such as SteerLM, DPO, and Reinforcement Learning from Human Feedback (RLHF). These algorithms enable users to align language models to be more safe, harmless, and helpful. Users can perform end-to-end model alignment on a wide range of model sizes and take advantage of all the parallelism techniques to ensure their model alignment is done in a performant and resource-efficient manner. For more technical details, please refer to our [paper](https://arxiv.org/abs/2405.01481).

The NeMo-Aligner toolkit is built using the NeMo Framework, which enables scalable training across thousands of GPUs using tensor, data, and pipeline parallelism for all alignment components. Additionally, our checkpoints are cross-compatible with the NeMo ecosystem, facilitating inference deployment and further customization (https://github.com/NVIDIA/NeMo-Aligner).

The toolkit is currently in it's early stages. We are committed to improving the toolkit to make it easier for developers to pick and choose different alignment algorithms to build safe, helpful, and reliable models.

## Key Features

* **SteerLM: Attribute Conditioned SFT as an (User-Steerable) alternative to RLHF**
* **SteerLM: Attribute Conditioned SFT as an (User-Steerable) alternative to RLHF.**
* [Llama3-70B-SteerLM-Chat](https://huggingface.co/nvidia/Llama3-70B-SteerLM-Chat) aligned with NeMo-Aligner.
* Corresponding reward model [Llama3-70B-SteerLM-RM](https://huggingface.co/nvidia/Llama3-70B-SteerLM-RM).
* Learn more at our [SteerLM](https://arxiv.org/abs/2310.05344) and [HelpSteer2](https://arxiv.org/abs/2406.08673) papers.
Expand Down
5 changes: 5 additions & 0 deletions docs/user-guide/cai.rst
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,11 @@ This section is a step-by-step tutorial that walks you through how to run a full

7. Run inference.

.. note::
Before starting this tutorial, be sure to review the :ref:`introduction <model-aligner-intro>` for tips on setting up your NeMo-Aligner environment.

If you run into any problems, refer to NeMo's `Known Issues page <https://docs.nvidia.com/nemo-framework/user-guide/latest/knownissues.html>`__. The page enumerates known issues and provides suggested workarounds where appropriate.

.. image:: ../assets/cai_flow.png

Step 1: Download models and datasets
Expand Down
5 changes: 5 additions & 0 deletions docs/user-guide/dpo.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,11 @@
Model Alignment by DPO, RPO, and IPO
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

.. note::
Before starting this tutorial, be sure to review the :ref:`introduction <model-aligner-intro>` for tips on setting up your NeMo-Aligner environment.

If you run into any problems, refer to NeMo's `Known Issues page <https://docs.nvidia.com/nemo-framework/user-guide/latest/knownissues.html>`__. The page enumerates known issues and provides suggested workarounds where appropriate.

The NeMo Framework supports efficient model alignment via the NeMo-Aligner codebase.

All algorithms in NeMo-Aligner will work with any GPT-based model that is from Megatron Core (in the config it has ``mcore_gpt=True``). For the purposes of this tutorial, we will go through the entire Direct Preference Optimization (DPO) pipeline using the newly released `2B GPT model with 4096 sequence length <https://huggingface.co/nvidia/GPT-2B-001>`__. The same tutorial also works for GPT models (such as LLaMa2) of any size.
Expand Down
5 changes: 5 additions & 0 deletions docs/user-guide/draftp.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,11 @@
Fine-tuning Stable Diffusion with DRaFT+
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

.. note::
Before starting this tutorial, be sure to review the :ref:`introduction <model-aligner-intro>` for tips on setting up your NeMo-Aligner environment.

If you run into any problems, refer to NeMo's `Known Issues page <https://docs.nvidia.com/nemo-framework/user-guide/latest/knownissues.html>`__. The page enumerates known issues and provides suggested workarounds where appropriate.

In this tutorial, we will go through the step-by-step guide for fine-tuning Stable Diffusion model using DRaFT+ algorithm by NVIDIA.
DRaFT+ is an improvement over the `DRaFT <https://arxiv.org/pdf/2309.17400.pdf>`__ algorithm by alleviating the mode collapse and improving diversity through regularization.
For more technical details on the DRaFT+ algorithm, check out our technical blog.
Expand Down
30 changes: 30 additions & 0 deletions docs/user-guide/modelalignment.rsts
Original file line number Diff line number Diff line change
@@ -1,2 +1,32 @@
Model Alignment
!!!!!!!!!!!!!!!

Introduction
############

NeMo-Aligner is a scalable toolkit for efficient model alignment. The toolkit has support for state-of-the-art model alignment algorithms such as SteerLM, Direct Preference Optimization (DPO), and Reinforcement Learning from Human Feedback (RLHF). These algorithms enable users to align language models to be more safe, harmless, and helpful. Users can perform end-to-end model alignment on a wide range of model sizes and take advantage of all the parallelism techniques to ensure their model alignment is done in a performant and resource-efficient manner. For more technical details, please refer to our `paper <https://arxiv.org/abs/2405.01481>`__.

The NeMo-Aligner toolkit is built using the `NeMo Toolkit <https://github.com/NVIDIA/NeMo>`__ which allows for scaling training up to 1000s of GPUs using tensor, data and pipeline parallelism for all components of alignment. All of our checkpoints are cross-compatible with the NeMo ecosystem, allowing for inference deployment and further customization.

The toolkit is currently in its early stages. We are committed to improving the toolkit to make it easier for developers to pick and choose different alignment algorithms to build safe, helpful, and reliable models.

Get Started
###########

NeMo-Aligner comes preinstalled in NVIDIA NeMo containers. NeMo containers are launched concurrently with NeMo version updates.

To get access to the container, log in to the NVIDIA GPU Cloud (NGC) platform or create a free NGC account here: `NVIDIA NGC <https://ngc.nvidia.com/signin>`__. Once you have logged in, you can get the container here: `NVIDIA NGC NeMo Framework <https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo>`__.

To use a pre-built container, run the following code:

.. code-block:: bash

docker run -it --gpus=all --shm-size=8g --workdir /opt/NeMo-Aligner nvcr.io/nvidia/nemo:24.09

Please use the latest tag in the form yy.mm.(patch).

.. note::
- Some of the subsequent tutorials require accessing gated Hugging Face models. For details on how to access these models, refer to `this document <https://docs.nvidia.com/nemo-framework/user-guide/latest/best-practices.html#working-with-hugging-face-models>`__.
- If you run into any problems, refer to NeMo's `Known Issues page <https://docs.nvidia.com/nemo-framework/user-guide/latest/knownissues.html>`__. The page enumerates known issues and provides suggested workarounds where appropriate.


7 changes: 6 additions & 1 deletion docs/user-guide/rlhf.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,12 @@
Model Alignment by RLHF
@@@@@@@@@@@@@@@@@@@@@@@

For the purposes of this tutorial, we will go through the entire Reinforcement Learning from Human Feedback (RLHF) pipeline using models from the NeMo Framework. These models can include LLaMa2 or Mistral, and our scripts will function consistently across them.
.. note::
Before starting this tutorial, be sure to review the :ref:`introduction <model-aligner-intro>` for tips on setting up your NeMo-Aligner environment.

If you run into any problems, refer to NeMo's `Known Issues page <https://docs.nvidia.com/nemo-framework/user-guide/latest/knownissues.html>`__. The page enumerates known issues and provides suggested workarounds where appropriate.

For the purposes of this tutorial, we will go through the entire Reinforcement Learning from Human Feedback (RLHF) pipeline using models from the NeMo Framework. These models can include LLaMa or Mistral, and our scripts will function consistently across them.

RLHF is usually preceded by a Supervised Fine-Tuning (SFT). We should first follow the :ref:`Prerequisite guide <prerequisite>` and the :ref:`SFT guide <sft>`. After obtaining the SFT model, we will use this to start the RLHF process. We will use the `PPO <https://arxiv.org/abs/1707.06347>`__ algorithm for reinforcement learning on the `Anthropic-HH-RLHF <https://huggingface.co/datasets/Anthropic/hh-rlhf>`__ dataset.

Expand Down
7 changes: 6 additions & 1 deletion docs/user-guide/rs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,12 @@
Model Alignment by Rejection Sampling
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

In this tutorial, we will guide you through the process of aligning a NeMo Framework model using rejection sampling. This method can be applied to various models, including LLaMa2 and Mistral, with our scripts functioning consistently across different models.
.. note::
Before starting this tutorial, be sure to review the :ref:`introduction <model-aligner-intro>` for tips on setting up your NeMo-Aligner environment.

If you run into any problems, refer to NeMo's `Known Issues page <https://docs.nvidia.com/nemo-framework/user-guide/latest/knownissues.html>`__. The page enumerates known issues and provides suggested workarounds where appropriate.

In this tutorial, we will guide you through the process of aligning a NeMo Framework model using rejection sampling. This method can be applied to various models, including LLaMa and Mistral, with our scripts functioning consistently across different models.

Rejection Sampling is usually preceded by a Supervised Fine-Tuning (SFT). We should first follow the :ref:`Prerequisite guide <prerequisite>` and the :ref:`SFT guide <sft>`. After obtaining the SFT model, we will also need to train a reward model as in :ref:`PPO guide <ppo>`. We will use the rejection sampling algorithm on the `Anthropic-HH-RLHF <https://huggingface.co/datasets/Anthropic/hh-rlhf>`__ dataset.

Expand Down
5 changes: 5 additions & 0 deletions docs/user-guide/sft.rst
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,11 @@ Model Alignment by Supervised Fine-Tuning (SFT)

2. **Chat**. In the *Chat* format, each example contains a multi-turn conversation between different roles (e.g., *User* and *Assistant*). Fine-tuning the base model on a chat format dataset is useful to align a chatbot.

.. note::
Before starting this tutorial, be sure to review the :ref:`introduction <model-aligner-intro>` for tips on setting up your NeMo-Aligner environment.

If you run into any problems, refer to NeMo's `Known Issues page <https://docs.nvidia.com/nemo-framework/user-guide/latest/knownissues.html>`__. The page enumerates known issues and provides suggested workarounds where appropriate.

Fine-Tune with a Prompt-Response Dataset
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Expand Down
11 changes: 8 additions & 3 deletions docs/user-guide/spin.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,14 @@ The NeMo framework supports efficient model alignment via the NeMo Aligner codeb

All algorithms in NeMo Aligner will work with any GPT based model that is from mcore(i.e in the config it has ``mcore_gpt=True``). For the purposes of this tutorial, we will go through the entire SPIN pipeline using the newly released `2B GPT model with 4096 sequence length <https://huggingface.co/nvidia/GPT-2B-001>`__. This same tutorial also works for GPT models(such as LLaMa2) of any size.

Obtaining a pretrained model
############################
To start, we must first get a pretrained model to align. There are 2 models we recommend to get started. The rest of the tutorial will work with either model, but for demonstration purposes we will use the smaller 2B model.
.. note::
Before starting this tutorial, be sure to review the :ref:`introduction <model-aligner-intro>` for tips on setting up your NeMo-Aligner environment.

If you run into any problems, refer to NeMo's `Known Issues page <https://docs.nvidia.com/nemo-framework/user-guide/latest/knownissues.html>`__. The page enumerates known issues and provides suggested workarounds where appropriate.

Obtain a Pretrained Model
#########################
To start, we must first get a pretrained model to align. There are two models we recommend to get started. The rest of the tutorial will work with either model, but for demonstration purposes, we will use the smaller 2B model.

.. tab-set::

Expand Down
5 changes: 5 additions & 0 deletions docs/user-guide/steerlm.rst
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,11 @@ Train a SteerLM model

This section is a step-by-step tutorial that walks you through how to run a full SteerLM pipeline with a Llama2 70B LLM model. It includes the following:

.. note::
Before starting this tutorial, be sure to review the :ref:`introduction <model-aligner-intro>` for tips on setting up your NeMo-Aligner environment.

If you run into any problems, refer to NeMo's `Known Issues page <https://docs.nvidia.com/nemo-framework/user-guide/latest/knownissues.html>`__. The page enumerates known issues and provides suggested workarounds where appropriate.

1. Data download and preprocessing

2. Training the attribute prediction model (aka regression reward model)
Expand Down
4 changes: 4 additions & 0 deletions examples/nlp/gpt/conf/inference_rm.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -49,3 +49,7 @@ model:
regression:
merge_attributes: False # whether to merge attribute values into a scalar
attribute_weights: null # apply these weights to each attributes when merging them into a scalar

# NOTE: The user does not need to change the global batch size below
# GBS is overridden to 0 to disable checks for compatibility with the megatron-core parallel state
global_batch_size: 0
3 changes: 3 additions & 0 deletions examples/nlp/gpt/train_gpt_sft.py
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,9 @@ def _modify_config(gpt_cfg, cfg, add_cfg_to_tree=False):
if cfg.model.get("seq_len_interpolation_factor", None) is not None:
gpt_cfg.seq_len_interpolation_factor = cfg.model.seq_len_interpolation_factor

if cfg.model.get("dist_ckpt_load_strictness", None) is not None:
gpt_cfg.dist_ckpt_load_strictness = cfg.model.dist_ckpt_load_strictness

gpt_cfg.inference = cfg.model.get("inference", {})

# This is needed when modifying a hparam file directly to load `.ckpt` files.
Expand Down
3 changes: 3 additions & 0 deletions examples/nlp/gpt/train_steerlm2.py
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,9 @@ def _modify_config(gpt_cfg, cfg, add_cfg_to_tree=False):
if cfg.model.get("seq_len_interpolation_factor", None) is not None:
gpt_cfg.seq_len_interpolation_factor = cfg.model.seq_len_interpolation_factor

if cfg.model.get("dist_ckpt_load_strictness", None) is not None:
gpt_cfg.dist_ckpt_load_strictness = cfg.model.dist_ckpt_load_strictness

gpt_cfg.inference = cfg.model.get("inference", {})

# This is needed when modifying a hparam file directly to load `.ckpt` files.
Expand Down
4 changes: 2 additions & 2 deletions nemo_aligner/models/nlp/gpt/megatron_gpt_critic.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
from enum import Enum

import torch
from megatron.core.num_microbatches_calculator import get_num_microbatches, reconfigure_microbatch_calculator
from megatron.core.num_microbatches_calculator import get_num_microbatches, reconfigure_num_microbatches_calculator
from megatron.core.pipeline_parallel.schedules import get_forward_backward_func
from megatron.core.transformer.module import Float16Module
from omegaconf.dictconfig import DictConfig
Expand Down Expand Up @@ -73,7 +73,7 @@ def prepare_for_inference(self):

def prepare_for_training(self):
app_state = AppState()
reconfigure_microbatch_calculator(
reconfigure_num_microbatches_calculator(
rank=app_state.global_rank,
rampup_batch_size=None,
global_batch_size=self.cfg.global_batch_size,
Expand Down
2 changes: 1 addition & 1 deletion nemo_aligner/models/nlp/gpt/megatron_gpt_reward_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ def model_provider_func(self, pre_process, post_process):

model = GPTRewardModel(
config=self.transformer_config,
transformer_layer_spec=get_specs(self.spec_name, self.transformer_config.num_moe_experts),
transformer_layer_spec=get_specs(self.spec_name, self.transformer_config),
vocab_size=self.cfg.get("override_vocab_size", self.padded_vocab_size),
max_sequence_length=self.cfg.get("encoder_seq_length", 512),
pre_process=pre_process,
Expand Down
4 changes: 2 additions & 2 deletions nemo_aligner/package_info.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,9 @@


MAJOR = 0
MINOR = 6
MINOR = 5
PATCH = 0
PRE_RELEASE = "rc0"
PRE_RELEASE = ""

# Use the following formatting: (major, minor, patch, pre-release)
VERSION = (MAJOR, MINOR, PATCH, PRE_RELEASE)
Expand Down
8 changes: 5 additions & 3 deletions nemo_aligner/utils/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@

import torch
from megatron.core.dist_checkpointing.mapping import ShardedObject, ShardedTensorFactory
from megatron.core.num_microbatches_calculator import reconfigure_microbatch_calculator
from megatron.core.num_microbatches_calculator import reconfigure_num_microbatches_calculator
from omegaconf import DictConfig, OmegaConf
from torch.masked import as_masked_tensor

Expand Down Expand Up @@ -122,7 +122,9 @@ def load_checkpoint_model_config(restore_path):
return OmegaConf.load(cfg_path)

with tempfile.TemporaryDirectory() as tmpdir:
NLPSaveRestoreConnector._unpack_nemo_file(restore_path, tmpdir, extract_config_only=True)
# Extracts only model config
members = NLPSaveRestoreConnector._filtered_tar_info(restore_path, filter_fn=lambda name: ".yaml" in name)
NLPSaveRestoreConnector._unpack_nemo_file(restore_path, tmpdir, members=members)
cfg = OmegaConf.load(os.path.join(tmpdir, config_name_in_ckpt))

return cfg
Expand Down Expand Up @@ -229,7 +231,7 @@ def calculate_response_lengths(tokens, eos_id):

def configure_batch_sizes(mbs, gbs, dp=1):
app_state = AppState()
reconfigure_microbatch_calculator(
reconfigure_num_microbatches_calculator(
rank=app_state.global_rank,
rampup_batch_size=None,
global_batch_size=gbs,
Expand Down
2 changes: 1 addition & 1 deletion setup/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
jsonlines
megatron_core==0.8
megatron_core>=0.8
nemo_toolkit[nlp]
nvidia-pytriton
Loading