Size of tensor issue when calculating sentiment #3

clarkenj · 2024-10-30T18:51:53Z

When calculating text sentiment, the model mrm8488/t5-base-finetuned-emotion has a token limit of 514 tokens. For samples longer than this the pipeline throws an error (and exits), for example:

ERROR ~ Error executing process > 'Text_Metrics (289)'

Caused by:
  Process `Text_Metrics (289)` terminated with an error exit status (1)


Command executed:

  text2variable --pid sub-HNB8198 -d . -l en /data/brambati/dataset/CCNA/derivatives/cookie_txt/results/sub-HNB8198/sub-HNB8198_task-cookie_transcript-www_speaker-participant.txt lg

Command exit status:
  1

Command output:
  Modèles chargés avec succès :
  - Nom personnalisé : english_lg, Modèle : en_core_web_lg
  Compteur de verbes légers non fonctionnel pour le moment

Command error:
    warnings.warn(
  /usr/local/lib/python3.12/site-packages/lingua_extraction/Caracteristiques_semantiques.py:89: RuntimeWarning: invalid value encountered in scalar divide
    similarite = np.dot(emb1, emb2) / (np.linalg.norm(emb1) * np.linalg.norm(emb2))
  Traceback (most recent call last):
    File "/usr/local/bin/text2variable", line 8, in <module>
      sys.exit(main())
  Modèles chargés avec succès :
  - Nom personnalisé : english_lg, Modèle : en_core_web_lg
  Compteur de verbes légers non fonctionnel pour le moment
               ^^^^^^
    File "/usr/local/lib/python3.12/site-packages/lingua_extraction/main.py", line 311, in main
      sentiment = get_sentiment(texte_brut) # "Positive", "Negative", "Neutral
                  ^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/lingua_extraction/Caracteristiques_pragmatiques.py", line 170, in get_sentiment
      result = sentiment_task(text)
               ^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/transformers/pipelines/text_classification.py", line 156, in __call__
      result = super().__call__(*inputs, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/transformers/pipelines/base.py", line 1254, in __call__
      return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/transformers/pipelines/base.py", line 1261, in run_single
      model_outputs = self.forward(model_inputs, **forward_params)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/transformers/pipelines/base.py", line 1161, in forward
      model_outputs = self._forward(model_inputs, **forward_params)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/transformers/pipelines/text_classification.py", line 187, in _forward
      return self.model(**model_inputs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
      return self._call_impl(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
      return forward_call(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/transformers/models/xlm_roberta/modeling_xlm_roberta.py", line 1205, in forward
      outputs = self.roberta(
                ^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
      return self._call_impl(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
      return forward_call(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/transformers/models/xlm_roberta/modeling_xlm_roberta.py", line 800, in forward
      buffered_token_type_ids_expanded = buffered_token_type_ids.expand(batch_size, seq_length)
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  RuntimeError: The expanded size of the tensor (607) must match the existing size (514) at non-singleton dimension 1.  Target sizes: [1, 607].  Tensor sizes: [1, 514]

Possible fixes:

Pass a max token size to truncate the sample and only calculate on the first 514 tokens
Implement a window based approach
Use a model with a higher token limit

I am leaning towards number 1 since it is the most time efficient.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Size of tensor issue when calculating sentiment #3

Size of tensor issue when calculating sentiment #3

clarkenj commented Oct 30, 2024

Size of tensor issue when calculating sentiment #3

Size of tensor issue when calculating sentiment #3

Comments

clarkenj commented Oct 30, 2024