update based on comments

NVIDIA · Dec 1, 2023 · 3fa0d46 · 3fa0d46
1 parent f73a495
commit 3fa0d46
Showing 1 changed file with 84 additions and 38 deletions.
diff --git a/docs/user-guide/SteerLM.rst b/docs/user-guide/SteerLM.rst
@@ -43,19 +43,20 @@ Training the attribute-conditioned SFT
 Inference on the SteerLM model with different attribute values
 
 
-Step 1: Download Llama 2 LLM model from HF <https://huggingface.co/meta-llama/Llama-2-70b-hf> and convert
+Step 1: Download Llama 2 LLM model 
 #############################################################
-Download the Llama 2 70B LLM model and tokenizer into the models folder.
+Download the Llama 2 70B LLM model from HF <https://huggingface.co/meta-llama/Llama-2-70b-hf> into the models folder.
 
 Then convert the Llama 2 LLM into .nemo format:
 
 .. code-block:: bash
+
    mkdir -p /models/llama70b/
    python /opt/NeMo/scripts/nlp_language_modeling/convert_hf_llama_to_nemo.py --in-file /path/to/llama --out-file /models/llama70b/llama70b.nemo
 
-Download and convert to .nemo format for the 13B model <https://huggingface.co/meta-llama/Llama-2-13b-hf> as well, which is needed for Attribute Prediction Modelling step.
+Download and convert to .nemo format for the 13B model <https://huggingface.co/meta-llama/Llama-2-13b-hf> as well, which is needed for the Attribute Prediction Modelling step.
 
-Untar the .nemo file to obtain the tokenizer in NeMo format (only for 70b model):
+Untar the .nemo file to obtain the tokenizer in NeMo format (only for the 70B model):
 
 .. code-block:: bash
 
@@ -102,7 +103,7 @@ Step 3: Train the regression reward model on OASST+HelpSteer data
 ###################################################
 For this tutorial, train the regression reward model for 800 steps. 
 
-Note that you would would need to set up multi-node training in your cluster env, depending on the type of cluster you use. For details, please refer to https://lightning.ai/docs/pytorch/stable/clouds/cluster.html
+Note that you would need to set up multi-node training in your cluster env, depending on the type of cluster you use. For details, please refer to https://lightning.ai/docs/pytorch/stable/clouds/cluster.html
 
 .. code-block:: bash
    
@@ -114,11 +115,11 @@ Note that you would would need to set up multi-node training in your cluster env
          ++model.data.data_impl=jsonl \
          pretrained_checkpoint.restore_from_path=/models/llama13b/llama13b.nemo \
          "model.data.data_prefix={train: ["data/merge_train_reg.jsonl"], validation: ["data/merge_val_reg.jsonl"], test: ["data/merge_val_reg.jsonl"]}" \
-         exp_manager.create_wandb_logger=False \
-         exp_manager.wandb_logger_kwargs.project=rm_training \
-         exp_manager.wandb_logger_kwargs.name=rm_training \
          exp_manager.explicit_log_dir=/results \
          trainer.rm.val_check_interval=10 \
+         exp_manager.create_wandb_logger=True \
+         exp_manager.wandb_logger_kwargs.project=steerlm \
+         exp_manager.wandb_logger_kwargs.name=rm_training \
          trainer.rm.save_interval=10 \
          trainer.rm.max_steps=800 \
          ++model.tensor_model_parallel_size=4 \
@@ -128,9 +129,7 @@ Note that you would would need to set up multi-node training in your cluster env
          model.global_batch_size=512 \
          model.optim.sched.constant_steps=0 \
          model.reward_model_type="regression" \
-         model.regression.num_attributes=9 \
-         model.regression.merge_attributes=False \
-         model.regression.loss_mask_val=-100
+         model.regression.num_attributes=9
 
 
 Step 4: Generate annotations
@@ -140,27 +139,31 @@ To generate annotation, run the following command in the background to run an in
 .. code-block:: bash
 
    python /opt/NeMo-Aligner/examples/nlp/gpt/serve_reward_model.py \
-         rm_model_file="${CHECKPOINT_NEMO_FILE}" \
+         rm_model_file=/results/checkpoints/megatron_gpt.nemo \
          trainer.num_nodes=1 \
          trainer.devices=8 \
          ++model.tensor_model_parallel_size=4 \
          ++model.pipeline_model_parallel_size=1 \
          inference.micro_batch_size=2 \
-         model.regression.merge_attributes=False \
-         ++model.regression.num_attributes=9 \
          inference.port=1424
 
 
 Now execute:
 
 .. code-block:: bash
 
-   python /opt/NeMo-Aligner/examples/nlp/gpt/attribute_annotate.py --input-file=data/oasst/train.jsonl --output-file=data/oasst/train_labeled.jsonl --port=1424
+   python /opt/NeMo-Aligner/examples/nlp/gpt/attribute_annotate.py \
+         --input-file=data/oasst/train.jsonl \
+         --output-file=data/oasst/train_labeled.jsonl \
+         --port=1424
 
-   python /opt/NeMo-Aligner/examples/nlp/gpt/attribute_annotate.py --input-file=data/oasst/val.jsonl --output-file=data/oasst/val_labeled.jsonl --port=1424
+   python /opt/NeMo-Aligner/examples/nlp/gpt/attribute_annotate.py \
+         --input-file=data/oasst/val.jsonl \
+         --output-file=data/oasst/val_labeled.jsonl \
+         --port=1424
 
 .. note::
-   This step can take a long time to run. For the purposes of this tutorial, we use a single inference server. For optimal results, use the full dataset and multiple inference servers to run data annotation in parallel.
+   This step can take a long time to run. For the purposes of this tutorial, we use a single inference server. For optimal results, use multiple inference servers to run data annotation in parallel (by splitting the files into multiple individual files).
 
 
 Step 5: Train the Attribute-Conditioned SFT model
@@ -203,18 +206,17 @@ For the purposes of this tutorial, the Attribute-Conditioned SFT model is traine
         model.data.train_ds.index_mapping_dir=/indexmap_dir \
         model.data.train_ds.add_eos=False \
         model.data.train_ds.hf_dataset=True \
-        model.data.validation_ds.max_seq_length=4906 \
+        model.data.validation_ds.max_seq_length=4096 \
         model.data.validation_ds.file_path=data/oasst/val_labeled.jsonl \
         model.data.validation_ds.micro_batch_size=1 \
         model.data.validation_ds.global_batch_size=128 \
         model.data.validation_ds.index_mapping_dir=/indexmap_dir \
         model.data.validation_ds.add_eos=False \
         model.data.validation_ds.hf_dataset=True \
         exp_manager.create_wandb_logger=True \
-        exp_manager.explicit_log_dir=/results \
-        exp_manager.resume_if_exists=True \
-        exp_manager.resume_ignore_no_checkpoint=True \
-        exp_manager.create_checkpoint_callback=True
+        exp_manager.wandb_logger_kwargs.project=steerlm \
+        exp_manager.wandb_logger_kwargs.name=acsft_training \
+        exp_manager.explicit_log_dir=/results
         
 
 
@@ -225,7 +227,7 @@ To start inference, run an inference server in the background using the followin
 .. code-block:: bash
 
    python /opt/NeMo/examples/nlp/language_modeling/megatron_gpt_eval.py \
-           gpt_model_file=/models/<TRAINED_STEERLM_MODEL.nemo> \
+           gpt_model_file=/results/steerlm_70b/checkpoints/megatron_gpt_sft.nemo \
            pipeline_model_parallel_split_rank=0 \
            server=True \
            tensor_model_parallel_size=8 \
@@ -242,42 +244,86 @@ Next, create Python helper functions:
 
 .. code-block:: python
 
-   def get_answer(question, max_tokens, values, eval_port='1427'):
-      prompt ="<extra_id_0>System\nA chat between a curious user and an artificial intelligence assistant. \nThe assistant gives helpful, detailed, and polite answers to the user's questions.\n\n<extra_id_1>User\n{question}\n<extra_id_1>Assistant\n<extra_id_2>{values}\n"
-      prompts = [prompt.format(question=question, values=values))]
-      data = {"sentences": prompts, "tokens_to_generate": max_tokens, "top_k": 1, 'greedy': True, 'end_strings': ["<extra_id_1>"]}
+   def get_answer(question, max_tokens, values, eval_port=1427):
+      prompt = (
+          "<extra_id_0>System\nA chat between a curious user and an artificial intelligence assistant. "
+          "The assistant gives helpful, detailed, and polite answers to the user's questions.\n"
+          "<extra_id_1>User\n{question}\n<extra_id_1>Assistant\n<extra_id_2>{values}\n"
+      )
+      prompts = [prompt.format(question=question, values=values)]
+      data = {
+          "sentences": prompts,
+          "tokens_to_generate": max_tokens,
+          "top_k": 1,
+          "greedy": True,
+          "end_strings": ["<extra_id_1>"],
+      }
       url = f"http://localhost:{eval_port}/generate"
       response = requests.put(url, json=data)
       json_response = response.json()
-      response_sentence = json_response['sentences'][0][len(prompt):]
+      response_sentence = json_response["sentences"][0][len(prompt):]
       return response_sentence
 
 .. code-block:: python
 
    def encode_labels(labels):
-      items = []
-      for key in labels:
-         value = labels[key]
-         items.append(f'{key}:{value}')
-      return ','.join(items)
+      return ",".join(f"{key}:{value}" for key, value in labels.items())
 
 Next, change the values below to steer the language model:
 
 .. code-block:: python
 
-   values = OrderedDict([('quality', 4), ('toxicity', 0), ('humor', 0), ('creativity', 0), ('helpfulness', 4), ('correctness', 4), ('coherence', 4), ('complexity', 4), ('verbosity', 4)])
+   values = OrderedDict(
+      [
+         ("quality", 4),
+         ("toxicity", 0),
+         ("humor", 0),
+         ("creativity", 0),
+         ("helpfulness", 4),
+         ("correctness", 4),
+         ("coherence", 4),
+         ("complexity", 4),
+         ("verbosity", 4),
+      ]
+   )
    values = encode_labels(values)
 
 Finally, ask questions and generate responses:
 
 .. code-block:: python
 
-   question = """Where and when did techno music originate?"""
-   print (get_answer(question, 4096, values))
+   question = "Write a poem on NVIDIA in the style of Shakespeare"
+   print(get_answer(question, 4096, values))
+
+Response is as below
+.. code-block:: python
+
+   """
+   In days of yore, in tech's great hall,
+   A company arose, NVIDIA its call.
+   Its graphics cards, with power immense,
+   Made gaming realms and designers' dreams commence.
+
+   With GPUs strong, it paved the way,
+   For AI's rise in every single day.
+   Deep learning's core, its chips did host,
+   And made the world a brighter, smarter post.
+   
+   From self-driving cars to healthcare's art,
+   NVIDIA's touch reached every part.
+   Its innovations, without parallel,
+   Made human lives more efficient, agile.
+   
+   And as the world in awe did gaze,
+   NVIDIA's name in fame's book raised.
+   Its legacy, forever will remain,
+   A tech titan, in annals of humankind's gain.
+   """
+
 
 .. note::
-   This tutorial covers only steps 1-3: training the value model, generating annotations, and initial SteerLM model training. Step 4 bootstraps the SteerLM model by sampling responses conditioned on high quality data, but is ignored for simplicity in this tutorial.
+   This tutorial covers only Phase 1-3: training the value model, generating annotations, and initial SteerLM model training. Phase 4 bootstraps the SteerLM model by sampling responses conditioned on high quality data, but is ignored for simplicity in this tutorial.
 
-The future of AI with SteerLM
+SteerLM: Novel Technique for Simple and Controllable Model Alignment
 ##############################
 SteerLM provides a novel technique for realizing a new generation of AI systems aligned with human preferences in a controllable manner. Its conceptual simplicity, performance gains, and customizability highlight the transformative possibilities of user-steerable AI. To learn more, please check out our paper `SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF <https://arxiv.org/abs/2310.05344>`_.