You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When calculating text sentiment, the model mrm8488/t5-base-finetuned-emotion has a token limit of 514 tokens. For samples longer than this the pipeline throws an error (and exits), for example:
ERROR ~ Error executing process > 'Text_Metrics (289)'
Caused by:
Process `Text_Metrics (289)` terminated with an error exit status (1)
Command executed:
text2variable --pid sub-HNB8198 -d . -l en /data/brambati/dataset/CCNA/derivatives/cookie_txt/results/sub-HNB8198/sub-HNB8198_task-cookie_transcript-www_speaker-participant.txt lg
Command exit status:
1
Command output:
Modèles chargés avec succès :
- Nom personnalisé : english_lg, Modèle : en_core_web_lg
Compteur de verbes légers non fonctionnel pour le moment
Command error:
warnings.warn(
/usr/local/lib/python3.12/site-packages/lingua_extraction/Caracteristiques_semantiques.py:89: RuntimeWarning: invalid value encountered in scalar divide
similarite = np.dot(emb1, emb2) / (np.linalg.norm(emb1) * np.linalg.norm(emb2))
Traceback (most recent call last):
File "/usr/local/bin/text2variable", line 8, in <module>
sys.exit(main())
Modèles chargés avec succès :
- Nom personnalisé : english_lg, Modèle : en_core_web_lg
Compteur de verbes légers non fonctionnel pour le moment
^^^^^^
File "/usr/local/lib/python3.12/site-packages/lingua_extraction/main.py", line 311, in main
sentiment = get_sentiment(texte_brut) # "Positive", "Negative", "Neutral
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/lingua_extraction/Caracteristiques_pragmatiques.py", line 170, in get_sentiment
result = sentiment_task(text)
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/transformers/pipelines/text_classification.py", line 156, in __call__
result = super().__call__(*inputs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/transformers/pipelines/base.py", line 1254, in __call__
return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/transformers/pipelines/base.py", line 1261, in run_single
model_outputs = self.forward(model_inputs, **forward_params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/transformers/pipelines/base.py", line 1161, in forward
model_outputs = self._forward(model_inputs, **forward_params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/transformers/pipelines/text_classification.py", line 187, in _forward
return self.model(**model_inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/transformers/models/xlm_roberta/modeling_xlm_roberta.py", line 1205, in forward
outputs = self.roberta(
^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/transformers/models/xlm_roberta/modeling_xlm_roberta.py", line 800, in forward
buffered_token_type_ids_expanded = buffered_token_type_ids.expand(batch_size, seq_length)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: The expanded size of the tensor (607) must match the existing size (514) at non-singleton dimension 1. Target sizes: [1, 607]. Tensor sizes: [1, 514]
Possible fixes:
Pass a max token size to truncate the sample and only calculate on the first 514 tokens
Implement a window based approach
Use a model with a higher token limit
I am leaning towards number 1 since it is the most time efficient.
The text was updated successfully, but these errors were encountered:
When calculating text sentiment, the model
mrm8488/t5-base-finetuned-emotion
has a token limit of 514 tokens. For samples longer than this the pipeline throws an error (and exits), for example:Possible fixes:
I am leaning towards number 1 since it is the most time efficient.
The text was updated successfully, but these errors were encountered: