Skip to content

Commit

Permalink
Fix falcon tokenization step (#1441) [skip ci]
Browse files Browse the repository at this point in the history
* Fix falcon tokenization step

* chore: lint

---------

Co-authored-by: Wing Lian <[email protected]>
  • Loading branch information
pharaouk and winglian authored Mar 26, 2024
1 parent 0453464 commit 9ae17ae
Showing 1 changed file with 4 additions and 3 deletions.
7 changes: 4 additions & 3 deletions src/axolotl/utils/trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -124,9 +124,10 @@ def process_datasets_for_packing(cfg, train_dataset, eval_dataset):
eval_dataset = eval_dataset.remove_columns("attention_mask")

if cfg.model_config_type == "falcon":
LOG.info("dropping token_type_ids column")
train_dataset = train_dataset.remove_columns("token_type_ids")
if eval_dataset:
LOG.info("dropping token_type_ids column if it exists")
if "token_type_ids" in train_dataset.column_names:
train_dataset = train_dataset.remove_columns("token_type_ids")
if eval_dataset and "token_type_ids" in eval_dataset.column_names:
eval_dataset = eval_dataset.remove_columns("token_type_ids")

train_dataset = train_dataset.filter(
Expand Down

0 comments on commit 9ae17ae

Please sign in to comment.