ValueError: unhandled prompt tokenization strategy: chatml.intel #1187

interstellarninja · 2024-01-24T07:30:31Z

Please check that this issue hasn't been reported before.

I searched previous Bug Reports didn't find any similar reports.

Expected Behavior

data preprocessing should work with the correct tokenization strategy "chatml.intel" which works during training

Current behaviour

preprocessing is throwing "unhandled prompt tokenization strategy" with the following error and stack trace:

ValueError: unhandled prompt tokenization strategy: chatml.intel

[2024-01-24 02:25:25,065] [INFO] [axolotl.normalize_config:158] [PID:28954] [RANK:0] GPU memory usage baseline: 0.000GB (+0.769GB misc)
[2024-01-24 02:25:26,329] [DEBUG] [axolotl.load_tokenizer:210] [PID:28954] [RANK:0] EOS: 100257 / <|endoftext|>
[2024-01-24 02:25:26,329] [DEBUG] [axolotl.load_tokenizer:211] [PID:28954] [RANK:0] BOS: 100257 / <|endoftext|>
[2024-01-24 02:25:26,329] [DEBUG] [axolotl.load_tokenizer:212] [PID:28954] [RANK:0] PAD: 100257 / <|endoftext|>
[2024-01-24 02:25:26,330] [DEBUG] [axolotl.load_tokenizer:213] [PID:28954] [RANK:0] UNK: None / None
[2024-01-24 02:25:26,330] [INFO] [axolotl.load_tokenizer:218] [PID:28954] [RANK:0] No Chat template selected. Consider adding a chat template for easier inference.
[2024-01-24 02:25:26,330] [INFO] [axolotl.load_tokenized_prepared_datasets:164] [PID:28954] [RANK:0] Unable to find prepared dataset in last_run_prepared/69faa39e1027cf4f2c1d6356b77bd226
[2024-01-24 02:25:26,330] [INFO] [axolotl.load_tokenized_prepared_datasets:165] [PID:28954] [RANK:0] Loading raw datasets...
[2024-01-24 02:25:38,692] [ERROR] [axolotl.get_dataset_wrapper:732] [PID:28954] [RANK:0] unhandled prompt tokenization strategy: chatml.intel.
Traceback (most recent call last):
  File "/home/interstellarninja/miniconda3/envs/sft-finetune/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/interstellarninja/miniconda3/envs/sft-finetune/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/interstellarninja/ai_projects/axolotl/src/axolotl/cli/preprocess.py", line 55, in <module>
    fire.Fire(do_cli)
  File "/home/interstellarninja/miniconda3/envs/sft-finetune/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/interstellarninja/miniconda3/envs/sft-finetune/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/interstellarninja/miniconda3/envs/sft-finetune/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home/interstellarninja/ai_projects/axolotl/src/axolotl/cli/preprocess.py", line 46, in do_cli
    _ = load_datasets(cfg=parsed_cfg, cli_args=parsed_cli_args)
  File "/home/interstellarninja/ai_projects/axolotl/src/axolotl/cli/__init__.py", line 312, in load_datasets
    train_dataset, eval_dataset, total_num_steps, prompters = prepare_dataset(
  File "/home/interstellarninja/ai_projects/axolotl/src/axolotl/utils/data.py", line 68, in prepare_dataset
    train_dataset, eval_dataset, prompters = load_prepare_datasets(
  File "/home/interstellarninja/ai_projects/axolotl/src/axolotl/utils/data.py", line 532, in load_prepare_datasets
    dataset, prompters = load_tokenized_prepared_datasets(
  File "/home/interstellarninja/ai_projects/axolotl/src/axolotl/utils/data.py", line 368, in load_tokenized_prepared_datasets
    dataset_wrapper, dataset_prompter = get_dataset_wrapper(
  File "/home/interstellarninja/ai_projects/axolotl/src/axolotl/utils/data.py", line 735, in get_dataset_wrapper
    raise ValueError(
ValueError: unhandled prompt tokenization strategy: chatml.intel

Steps to reproduce

python -m axolotl.cli.preprocess /home/interstellarninja/ai_projects/axolotl/examples/stablelm/stablelm-1_6b-dpo.yml --debug

Config yaml

base_model: /home/interstellarninja/ai_projects/axolotl/stablelm-1_6b-tool-calling-1/merged
base_model_config: /home/interstellarninja/ai_projects/axolotl/stablelm-1_6b-tool-calling-1/merged
model_type: StableLMEpochForCausalLM
tokenizer_type: AutoTokenizer
trust_remote_code: true

load_in_8bit: false
load_in_4bit: true
strict: false

rl: dpo
datasets:
  - path: NousResearch/func-calling-dpo
    split: train
    type: chatml.intel

val_set_size: 0
dataset_prepared_path: last_run_prepared
output_dir: ./stablelm-1_6b-func-calling-dpo-1

adapter: qlora

sequence_len: 2048
sample_packing: true
eval_sample_packing: false
eval_batch_size: 1

lora_r: 16
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true

lora_modules_to_save:
  - embed_tokens
  - lm_head

wandb_project: tool-calling-dpo
wandb_run_id: stablelm-1_6b_func-calling-dpo-1

data_seed: 42
seed: 42

gradient_accumulation_steps: 1
micro_batch_size: 2
warmup_steps: 100
max_steps: 1000
num_epochs: 3
optimizer: paged_adamw_32bit
learning_rate: 0.00002
lr_scheduler: cosine

train_on_inputs: false
group_by_length: true
bf16: true
fp16: false
tf32: true

gradient_checkpointing: true
logging_steps: 1
xformers_attention: false
flash_attention: false

resume_from_checkpoint: false
save_strategy: steps
save_steps: 100
save_total_limit: 2
weight_decay: 0.0001

hub_model_id: interstellarninja/stablelm-zephyr-3b-func-calling-dpo

Possible solution

No response

Which Operating Systems are you using?

Linux
macOS
Windows

Python Version

3.10

axolotl branch-commit

main

Acknowledgements

My issue title is concise, descriptive, and in title casing.
I have searched the existing issues to make sure this bug has not been reported yet.
I am using the latest version of axolotl.
I have provided enough information for the maintainers to reproduce and diagnose the issue.

The text was updated successfully, but these errors were encountered:

winglian · 2024-01-24T23:30:14Z

#1185 fixes this

gardner · 2024-01-25T11:07:50Z

I think there might be a file missing from the commit 98b4762

This is still happening. I can't find anywhere in the code where prompt_strategies/dpo/chatml.py gets loaded or wired in.

It looks the the most recent git commit might have been a fixup. Can you please check to make sure all the intended changes got in there?

Thanks 🙏

filippo82 · 2024-01-28T20:31:47Z

It gets imported here: https://github.com/OpenAccess-AI-Collective/axolotl/blob/18f811978c01d567c2294140f53abcf8c086e337/src/axolotl/prompt_strategies/dpo/__init__.py#L15

filippo82 · 2024-01-28T20:34:40Z

However I see only now that you are actually witnessing this issue with python -m axolotl.cli.preprocess ... I missed that.

interstellarninja added the bug Something isn't working label Jan 24, 2024

winglian closed this as completed Jan 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: unhandled prompt tokenization strategy: chatml.intel #1187

ValueError: unhandled prompt tokenization strategy: chatml.intel #1187

interstellarninja commented Jan 24, 2024 •

edited

Loading

winglian commented Jan 24, 2024

gardner commented Jan 25, 2024

filippo82 commented Jan 28, 2024

filippo82 commented Jan 28, 2024

ValueError: unhandled prompt tokenization strategy: chatml.intel #1187

ValueError: unhandled prompt tokenization strategy: chatml.intel #1187

Comments

interstellarninja commented Jan 24, 2024 • edited Loading

Please check that this issue hasn't been reported before.

Expected Behavior

Current behaviour

Steps to reproduce

Config yaml

Possible solution

Which Operating Systems are you using?

Python Version

axolotl branch-commit

Acknowledgements

winglian commented Jan 24, 2024

gardner commented Jan 25, 2024

filippo82 commented Jan 28, 2024

filippo82 commented Jan 28, 2024

interstellarninja commented Jan 24, 2024 •

edited

Loading