Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Size of tensor issue when calculating sentiment #3

Open
clarkenj opened this issue Oct 30, 2024 · 0 comments
Open

Size of tensor issue when calculating sentiment #3

clarkenj opened this issue Oct 30, 2024 · 0 comments

Comments

@clarkenj
Copy link

When calculating text sentiment, the model mrm8488/t5-base-finetuned-emotion has a token limit of 514 tokens. For samples longer than this the pipeline throws an error (and exits), for example:

ERROR ~ Error executing process > 'Text_Metrics (289)'

Caused by:
  Process `Text_Metrics (289)` terminated with an error exit status (1)


Command executed:

  text2variable --pid sub-HNB8198 -d . -l en /data/brambati/dataset/CCNA/derivatives/cookie_txt/results/sub-HNB8198/sub-HNB8198_task-cookie_transcript-www_speaker-participant.txt lg

Command exit status:
  1

Command output:
  Modèles chargés avec succès :
  - Nom personnalisé : english_lg, Modèle : en_core_web_lg
  Compteur de verbes légers non fonctionnel pour le moment

Command error:
    warnings.warn(
  /usr/local/lib/python3.12/site-packages/lingua_extraction/Caracteristiques_semantiques.py:89: RuntimeWarning: invalid value encountered in scalar divide
    similarite = np.dot(emb1, emb2) / (np.linalg.norm(emb1) * np.linalg.norm(emb2))
  Traceback (most recent call last):
    File "/usr/local/bin/text2variable", line 8, in <module>
      sys.exit(main())
  Modèles chargés avec succès :
  - Nom personnalisé : english_lg, Modèle : en_core_web_lg
  Compteur de verbes légers non fonctionnel pour le moment
               ^^^^^^
    File "/usr/local/lib/python3.12/site-packages/lingua_extraction/main.py", line 311, in main
      sentiment = get_sentiment(texte_brut) # "Positive", "Negative", "Neutral
                  ^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/lingua_extraction/Caracteristiques_pragmatiques.py", line 170, in get_sentiment
      result = sentiment_task(text)
               ^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/transformers/pipelines/text_classification.py", line 156, in __call__
      result = super().__call__(*inputs, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/transformers/pipelines/base.py", line 1254, in __call__
      return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/transformers/pipelines/base.py", line 1261, in run_single
      model_outputs = self.forward(model_inputs, **forward_params)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/transformers/pipelines/base.py", line 1161, in forward
      model_outputs = self._forward(model_inputs, **forward_params)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/transformers/pipelines/text_classification.py", line 187, in _forward
      return self.model(**model_inputs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
      return self._call_impl(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
      return forward_call(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/transformers/models/xlm_roberta/modeling_xlm_roberta.py", line 1205, in forward
      outputs = self.roberta(
                ^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
      return self._call_impl(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
      return forward_call(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/transformers/models/xlm_roberta/modeling_xlm_roberta.py", line 800, in forward
      buffered_token_type_ids_expanded = buffered_token_type_ids.expand(batch_size, seq_length)
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  RuntimeError: The expanded size of the tensor (607) must match the existing size (514) at non-singleton dimension 1.  Target sizes: [1, 607].  Tensor sizes: [1, 514]

Possible fixes:

  1. Pass a max token size to truncate the sample and only calculate on the first 514 tokens
  2. Implement a window based approach
  3. Use a model with a higher token limit

I am leaning towards number 1 since it is the most time efficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant