[BUG] Token Classification fails on value error `text_column` #813

jmccrae · 2024-11-26T13:40:50Z

Prerequisites

I have read the documentation.
I have checked other issues for similar problems.

Backend

Local

Interface Used

CLI

CLI Command

from autotrain.params import TokenClassificationParams
from autotrain.project import AutoTrainProject


params = TokenClassificationParams(
    model="FacebookAI/roberta-base",
    data_path="data")    

backend = "local"
project = AutoTrainProject(params=params, backend=backend, process=True)
project.create()

UI Screenshots & Parameters

No response

Error Logs

Traceback (most recent call last):
  File "/home/jmccrae/scratch/wikilinks_autotrain/apply_autotrain.py", line 13, in <module>
    project.create()
  File "/home/jmccrae/.cache/pypoetry/virtualenvs/wikilinks-autotrain-NvKt9JsM-py3.12/lib/python3.12/site-packages/autotrain/project.py", line 567, in create
    self.params = self._process_params_data()
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jmccrae/.cache/pypoetry/virtualenvs/wikilinks-autotrain-NvKt9JsM-py3.12/lib/python3.12/site-packages/autotrain/project.py", line 559, in _process_params_data
    return token_clf_munge_data(self.params, self.local)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jmccrae/.cache/pypoetry/virtualenvs/wikilinks-autotrain-NvKt9JsM-py3.12/lib/python3.12/site-packages/autotrain/project.py", line 265, in token_clf_munge_data
    params.text_column = "autotrain_text"
    ^^^^^^^^^^^^^^^^^^
  File "/home/jmccrae/.cache/pypoetry/virtualenvs/wikilinks-autotrain-NvKt9JsM-py3.12/lib/python3.12/site-packages/pydantic/main.py", line 884, in __setattr__
    raise ValueError(f'"{self.__class__.__name__}" object has no field "{name}"')

Additional Information

Data is formatted as in https://huggingface.co/docs/autotrain/en/tasks/token_classification

I also tried commenting out the offending lines and then run into this error

  File "/home/jmccrae/.cache/pypoetry/virtualenvs/wikilinks-autotrain-NvKt9JsM-py3.12/lib/python3.12/site-packages/autotrain/trainers/common.py", line 212, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/jmccrae/.cache/pypoetry/virtualenvs/wikilinks-autotrain-NvKt9JsM-py3.12/lib/python3.12/site-packages/autotrain/trainers/token_classification/__main__.py", line 89, in train
    label_list = train_data.features[config.tags_column].feature.names
                 ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
KeyError: 'tags'

The text was updated successfully, but these errors were encountered:

abhishekkrthakur · 2024-11-26T15:54:31Z

could you print column names in your dataset and the output of print(params) ?

jmccrae · 2024-11-27T08:59:25Z

You can run the reproduction here: https://colab.research.google.com/drive/1shka-nlusipnN6TTAlQPhcXhrvgehNF8?usp=sharing

This is the output of print(params)

{'data_path': 'data', 'model': 'FacebookAI/roberta-base', 'lr': 5e-05, 'epochs': 3, 
'max_seq_length': 128, 'batch_size': 8, 'warmup_ratio': 0.1, 
'gradient_accumulation': 1, 'optimizer': 'adamw_torch', 'scheduler': 'linear', 
'weight_decay': 0.0, 'max_grad_norm': 1.0, 'seed': 42, 'train_split': 'train',
 'valid_split': None, 'tokens_column': 'tokens', 'tags_column': 'tags',
 'logging_steps': -1, 'project_name': 'project-name', 
'auto_find_batch_size': False, 'mixed_precision': None, 'save_total_limit': 1,
 'token': None, 'push_to_hub': False, 'eval_strategy': 'epoch', 'username': None,
 'log': 'none', 'early_stopping_patience': 5, 'early_stopping_threshold': 0.01}

Also, I noted that the CSV on this page is broken as there is a space after the comma that breaks CSV parsing

abhishekkrthakur · 2024-11-27T09:54:15Z

fixed.

pip install -U autotrain-advanced

code:

import os

from autotrain.params import TokenClassificationParams
from autotrain.project import AutoTrainProject


if not os.path.exists("data"):
  os.makedirs("data")

with open("data/train.csv", "w") as f:
  print("tokens,tags", file=f)
  print("\"['I', 'love', 'Paris']\",\"['O', 'O', 'B-LOC']\"", file=f)
  print("\"['I', 'live', 'in', 'New', 'York']\",\"['O', 'O', 'O', 'B-LOC', 'I-LOC']\"", file=f)

with open("data/valid.csv", "w") as f:
  print("tokens,tags", file=f)
  print("\"['I', 'love', 'Paris']\",\"['O', 'O', 'B-LOC']\"", file=f)
  print("\"['I', 'live', 'in', 'New', 'York']\",\"['O', 'O', 'O', 'B-LOC', 'I-LOC']\"", file=f)


params = TokenClassificationParams(
    model="FacebookAI/roberta-base",
    data_path="data")

backend = "local"
project = AutoTrainProject(params=params, backend=backend, process=True)
project.create()

Note: ive changed the test filename to valid, otherwise, you need to specify valid_split in params.

apologies for the inconvenience.

jmccrae added the bug Something isn't working label Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Token Classification fails on value error `text_column` #813

[BUG] Token Classification fails on value error `text_column` #813

jmccrae commented Nov 26, 2024

abhishekkrthakur commented Nov 26, 2024

jmccrae commented Nov 27, 2024

abhishekkrthakur commented Nov 27, 2024

[BUG] Token Classification fails on value error text_column #813

[BUG] Token Classification fails on value error text_column #813

Comments

jmccrae commented Nov 26, 2024

Prerequisites

Backend

Interface Used

CLI Command

UI Screenshots & Parameters

Error Logs

Additional Information

abhishekkrthakur commented Nov 26, 2024

jmccrae commented Nov 27, 2024

abhishekkrthakur commented Nov 27, 2024

[BUG] Token Classification fails on value error `text_column` #813

[BUG] Token Classification fails on value error `text_column` #813