Skip to content

Commit

Permalink
update based on comments
Browse files Browse the repository at this point in the history
  • Loading branch information
Zhilin123 committed Dec 1, 2023
1 parent f73a495 commit 3fa0d46
Showing 1 changed file with 84 additions and 38 deletions.
122 changes: 84 additions & 38 deletions docs/user-guide/SteerLM.rst
Original file line number Diff line number Diff line change
Expand Up @@ -43,19 +43,20 @@ Training the attribute-conditioned SFT
Inference on the SteerLM model with different attribute values


Step 1: Download Llama 2 LLM model from HF <https://huggingface.co/meta-llama/Llama-2-70b-hf> and convert
Step 1: Download Llama 2 LLM model
#############################################################
Download the Llama 2 70B LLM model and tokenizer into the models folder.
Download the Llama 2 70B LLM model from HF <https://huggingface.co/meta-llama/Llama-2-70b-hf> into the models folder.

Then convert the Llama 2 LLM into .nemo format:

.. code-block:: bash
mkdir -p /models/llama70b/
python /opt/NeMo/scripts/nlp_language_modeling/convert_hf_llama_to_nemo.py --in-file /path/to/llama --out-file /models/llama70b/llama70b.nemo
Download and convert to .nemo format for the 13B model <https://huggingface.co/meta-llama/Llama-2-13b-hf> as well, which is needed for Attribute Prediction Modelling step.
Download and convert to .nemo format for the 13B model <https://huggingface.co/meta-llama/Llama-2-13b-hf> as well, which is needed for the Attribute Prediction Modelling step.

Untar the .nemo file to obtain the tokenizer in NeMo format (only for 70b model):
Untar the .nemo file to obtain the tokenizer in NeMo format (only for the 70B model):

.. code-block:: bash
Expand Down Expand Up @@ -102,7 +103,7 @@ Step 3: Train the regression reward model on OASST+HelpSteer data
###################################################
For this tutorial, train the regression reward model for 800 steps.

Note that you would would need to set up multi-node training in your cluster env, depending on the type of cluster you use. For details, please refer to https://lightning.ai/docs/pytorch/stable/clouds/cluster.html
Note that you would need to set up multi-node training in your cluster env, depending on the type of cluster you use. For details, please refer to https://lightning.ai/docs/pytorch/stable/clouds/cluster.html

.. code-block:: bash
Expand All @@ -114,11 +115,11 @@ Note that you would would need to set up multi-node training in your cluster env
++model.data.data_impl=jsonl \
pretrained_checkpoint.restore_from_path=/models/llama13b/llama13b.nemo \
"model.data.data_prefix={train: ["data/merge_train_reg.jsonl"], validation: ["data/merge_val_reg.jsonl"], test: ["data/merge_val_reg.jsonl"]}" \
exp_manager.create_wandb_logger=False \
exp_manager.wandb_logger_kwargs.project=rm_training \
exp_manager.wandb_logger_kwargs.name=rm_training \
exp_manager.explicit_log_dir=/results \
trainer.rm.val_check_interval=10 \
exp_manager.create_wandb_logger=True \
exp_manager.wandb_logger_kwargs.project=steerlm \
exp_manager.wandb_logger_kwargs.name=rm_training \
trainer.rm.save_interval=10 \
trainer.rm.max_steps=800 \
++model.tensor_model_parallel_size=4 \
Expand All @@ -128,9 +129,7 @@ Note that you would would need to set up multi-node training in your cluster env
model.global_batch_size=512 \
model.optim.sched.constant_steps=0 \
model.reward_model_type="regression" \
model.regression.num_attributes=9 \
model.regression.merge_attributes=False \
model.regression.loss_mask_val=-100
model.regression.num_attributes=9
Step 4: Generate annotations
Expand All @@ -140,27 +139,31 @@ To generate annotation, run the following command in the background to run an in
.. code-block:: bash
python /opt/NeMo-Aligner/examples/nlp/gpt/serve_reward_model.py \
rm_model_file="${CHECKPOINT_NEMO_FILE}" \
rm_model_file=/results/checkpoints/megatron_gpt.nemo \
trainer.num_nodes=1 \
trainer.devices=8 \
++model.tensor_model_parallel_size=4 \
++model.pipeline_model_parallel_size=1 \
inference.micro_batch_size=2 \
model.regression.merge_attributes=False \
++model.regression.num_attributes=9 \
inference.port=1424
Now execute:

.. code-block:: bash
python /opt/NeMo-Aligner/examples/nlp/gpt/attribute_annotate.py --input-file=data/oasst/train.jsonl --output-file=data/oasst/train_labeled.jsonl --port=1424
python /opt/NeMo-Aligner/examples/nlp/gpt/attribute_annotate.py \
--input-file=data/oasst/train.jsonl \
--output-file=data/oasst/train_labeled.jsonl \
--port=1424
python /opt/NeMo-Aligner/examples/nlp/gpt/attribute_annotate.py --input-file=data/oasst/val.jsonl --output-file=data/oasst/val_labeled.jsonl --port=1424
python /opt/NeMo-Aligner/examples/nlp/gpt/attribute_annotate.py \
--input-file=data/oasst/val.jsonl \
--output-file=data/oasst/val_labeled.jsonl \
--port=1424
.. note::
This step can take a long time to run. For the purposes of this tutorial, we use a single inference server. For optimal results, use the full dataset and multiple inference servers to run data annotation in parallel.
This step can take a long time to run. For the purposes of this tutorial, we use a single inference server. For optimal results, use multiple inference servers to run data annotation in parallel (by splitting the files into multiple individual files).


Step 5: Train the Attribute-Conditioned SFT model
Expand Down Expand Up @@ -203,18 +206,17 @@ For the purposes of this tutorial, the Attribute-Conditioned SFT model is traine
model.data.train_ds.index_mapping_dir=/indexmap_dir \
model.data.train_ds.add_eos=False \
model.data.train_ds.hf_dataset=True \
model.data.validation_ds.max_seq_length=4906 \
model.data.validation_ds.max_seq_length=4096 \
model.data.validation_ds.file_path=data/oasst/val_labeled.jsonl \
model.data.validation_ds.micro_batch_size=1 \
model.data.validation_ds.global_batch_size=128 \
model.data.validation_ds.index_mapping_dir=/indexmap_dir \
model.data.validation_ds.add_eos=False \
model.data.validation_ds.hf_dataset=True \
exp_manager.create_wandb_logger=True \
exp_manager.explicit_log_dir=/results \
exp_manager.resume_if_exists=True \
exp_manager.resume_ignore_no_checkpoint=True \
exp_manager.create_checkpoint_callback=True
exp_manager.wandb_logger_kwargs.project=steerlm \
exp_manager.wandb_logger_kwargs.name=acsft_training \
exp_manager.explicit_log_dir=/results
Expand All @@ -225,7 +227,7 @@ To start inference, run an inference server in the background using the followin
.. code-block:: bash
python /opt/NeMo/examples/nlp/language_modeling/megatron_gpt_eval.py \
gpt_model_file=/models/<TRAINED_STEERLM_MODEL.nemo> \
gpt_model_file=/results/steerlm_70b/checkpoints/megatron_gpt_sft.nemo \
pipeline_model_parallel_split_rank=0 \
server=True \
tensor_model_parallel_size=8 \
Expand All @@ -242,42 +244,86 @@ Next, create Python helper functions:

.. code-block:: python
def get_answer(question, max_tokens, values, eval_port='1427'):
prompt ="<extra_id_0>System\nA chat between a curious user and an artificial intelligence assistant. \nThe assistant gives helpful, detailed, and polite answers to the user's questions.\n\n<extra_id_1>User\n{question}\n<extra_id_1>Assistant\n<extra_id_2>{values}\n"
prompts = [prompt.format(question=question, values=values))]
data = {"sentences": prompts, "tokens_to_generate": max_tokens, "top_k": 1, 'greedy': True, 'end_strings': ["<extra_id_1>"]}
def get_answer(question, max_tokens, values, eval_port=1427):
prompt = (
"<extra_id_0>System\nA chat between a curious user and an artificial intelligence assistant. "
"The assistant gives helpful, detailed, and polite answers to the user's questions.\n"
"<extra_id_1>User\n{question}\n<extra_id_1>Assistant\n<extra_id_2>{values}\n"
)
prompts = [prompt.format(question=question, values=values)]
data = {
"sentences": prompts,
"tokens_to_generate": max_tokens,
"top_k": 1,
"greedy": True,
"end_strings": ["<extra_id_1>"],
}
url = f"http://localhost:{eval_port}/generate"
response = requests.put(url, json=data)
json_response = response.json()
response_sentence = json_response['sentences'][0][len(prompt):]
response_sentence = json_response["sentences"][0][len(prompt):]
return response_sentence
.. code-block:: python
def encode_labels(labels):
items = []
for key in labels:
value = labels[key]
items.append(f'{key}:{value}')
return ','.join(items)
return ",".join(f"{key}:{value}" for key, value in labels.items())
Next, change the values below to steer the language model:

.. code-block:: python
values = OrderedDict([('quality', 4), ('toxicity', 0), ('humor', 0), ('creativity', 0), ('helpfulness', 4), ('correctness', 4), ('coherence', 4), ('complexity', 4), ('verbosity', 4)])
values = OrderedDict(
[
("quality", 4),
("toxicity", 0),
("humor", 0),
("creativity", 0),
("helpfulness", 4),
("correctness", 4),
("coherence", 4),
("complexity", 4),
("verbosity", 4),
]
)
values = encode_labels(values)
Finally, ask questions and generate responses:

.. code-block:: python
question = """Where and when did techno music originate?"""
print (get_answer(question, 4096, values))
question = "Write a poem on NVIDIA in the style of Shakespeare"
print(get_answer(question, 4096, values))
Response is as below
.. code-block:: python
"""
In days of yore, in tech's great hall,
A company arose, NVIDIA its call.
Its graphics cards, with power immense,
Made gaming realms and designers' dreams commence.
With GPUs strong, it paved the way,
For AI's rise in every single day.
Deep learning's core, its chips did host,
And made the world a brighter, smarter post.
From self-driving cars to healthcare's art,
NVIDIA's touch reached every part.
Its innovations, without parallel,
Made human lives more efficient, agile.
And as the world in awe did gaze,
NVIDIA's name in fame's book raised.
Its legacy, forever will remain,
A tech titan, in annals of humankind's gain.
"""
.. note::
This tutorial covers only steps 1-3: training the value model, generating annotations, and initial SteerLM model training. Step 4 bootstraps the SteerLM model by sampling responses conditioned on high quality data, but is ignored for simplicity in this tutorial.
This tutorial covers only Phase 1-3: training the value model, generating annotations, and initial SteerLM model training. Phase 4 bootstraps the SteerLM model by sampling responses conditioned on high quality data, but is ignored for simplicity in this tutorial.

The future of AI with SteerLM
SteerLM: Novel Technique for Simple and Controllable Model Alignment
##############################
SteerLM provides a novel technique for realizing a new generation of AI systems aligned with human preferences in a controllable manner. Its conceptual simplicity, performance gains, and customizability highlight the transformative possibilities of user-steerable AI. To learn more, please check out our paper `SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF <https://arxiv.org/abs/2310.05344>`_.

0 comments on commit 3fa0d46

Please sign in to comment.