Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducing the Experiment Results #2

Open
gsmoon97 opened this issue Jul 24, 2023 · 3 comments
Open

Reproducing the Experiment Results #2

gsmoon97 opened this issue Jul 24, 2023 · 3 comments

Comments

@gsmoon97
Copy link

Hi, thank you for your interesting work. I was trying to reproduce the experiment results using the code provided and had some questions.

  1. I have attempted to run the train_and_eval.sh script to reproduce the results for English dataset after generating speech data for development set (5.7K sentences) and test set (5.7K sentences) with a sample speech data (10K sentences) for training set to test the script. However, I have encountered "IndexError: Dimension specified as 0 but tensor has no dimensions" error. Below is the full error log. Would you be able to provide some guidance on how I can fix this error?
torch                    2.0.0+cu117
torchaudio               2.0.1+cu117
torchvision              0.15.1+cu117
SpeechWithEncoderDecoderModel
07/24/2023 15:02:46 - INFO - model.speech_with_encoder_decoder.configuration_speech_with_encoder_decoder - Setting `config.is_decoder=True` and `config.add_cross_attention=True` for decoder_config
07/24/2023 15:02:46 - INFO - model.speech_with_encoder_decoder.configuration_speech_with_encoder_decoder - Setting `config.is_decoder=True` and `config.add_cross_attention=True` for decoder_config
/home/moongs/miniconda3/envs/py310-mm/lib/python3.10/site-packages/transformers/models/t5/tokenization_t5_fast.py:155: FutureWarning: This tokenizer was incorrectly instantiated with a model max length of 512 which will be corrected in Transformers v5.
For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.
- Be aware that you SHOULD NOT rely on t5-large automatically truncating your input to 512 when padding/encoding.
- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.
- To avoid this warning, please instantiate this tokenizer with `model_max_length` set to your preferred value.
  warnings.warn(
/home/moongs/miniconda3/envs/py310-mm/lib/python3.10/site-packages/transformers/models/t5/tokenization_t5_fast.py:155: FutureWarning: This tokenizer was incorrectly instantiated with a model max length of 512 which will be corrected in Transformers v5.
For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.
- Be aware that you SHOULD NOT rely on t5-large automatically truncating your input to 512 when padding/encoding.
- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.
- To avoid this warning, please instantiate this tokenizer with `model_max_length` set to your preferred value.
  warnings.warn(
num_gpus 2
use 2 gpus!
num_gpus 2
use 2 gpus!
/home/moongs/workspace/MultimodalGEC/gec_speech_moe_mse/model/trainer.py:162: FutureWarning: load_metric is deprecated and will be removed in the next major version of datasets. Use 'evaluate.load' instead, from the new library 🤗 Evaluate: https://huggingface.co/docs/evaluate
  self.test_metric = load_metric("rouge")
/home/moongs/workspace/MultimodalGEC/gec_speech_moe_mse/model/trainer.py:162: FutureWarning: load_metric is deprecated and will be removed in the next major version of datasets. Use 'evaluate.load' instead, from the new library 🤗 Evaluate: https://huggingface.co/docs/evaluate
  self.test_metric = load_metric("rouge")
07/24/2023 15:03:00 - INFO - root - ## clang-8 ######################### ... 
07/24/2023 15:03:00 - INFO - root - load train data ... 
07/24/2023 15:03:00 - INFO - root - finish loading train data ... 
07/24/2023 15:03:00 - INFO - root - ## conll14 ######################### ... 
07/24/2023 15:03:00 - INFO - root - load conll14 test data ... 
07/24/2023 15:03:00 - INFO - root - ## clang-8 ######################### ... 
07/24/2023 15:03:00 - INFO - root - load train data ... 
07/24/2023 15:03:00 - INFO - root - finish loading train data ... 
07/24/2023 15:03:00 - INFO - root - ## conll14 ######################### ... 
07/24/2023 15:03:00 - INFO - root - load conll14 test data ... 
07/24/2023 15:03:00 - INFO - root - finish loading coll14 test data ... 
07/24/2023 15:03:00 - INFO - root - ## conll13 ######################### ... 
07/24/2023 15:03:00 - INFO - root - load conll13 test data ... 
07/24/2023 15:03:00 - INFO - root - finish loading coll14 test data ... 
07/24/2023 15:03:00 - INFO - root - ## conll13 ######################### ... 
07/24/2023 15:03:00 - INFO - root - load conll13 test data ... 
07/24/2023 15:03:00 - INFO - root - finish loading coll13 test data ... 
07/24/2023 15:03:00 - INFO - root - ## bea19 test ######################### ... 
07/24/2023 15:03:00 - INFO - root - load bea19 test data ... 
07/24/2023 15:03:00 - INFO - root - finish loading coll13 test data ... 
07/24/2023 15:03:00 - INFO - root - ## bea19 test ######################### ... 
07/24/2023 15:03:00 - INFO - root - load bea19 test data ... 
07/24/2023 15:03:01 - INFO - root - finish loading bea19 test data ... 
07/24/2023 15:03:01 - INFO - root - ## bea19 dev ######################### ... 
07/24/2023 15:03:01 - INFO - root - load bea19 eval data ... 
07/24/2023 15:03:01 - INFO - root - finish loading bea19 test data ... 
07/24/2023 15:03:01 - INFO - root - ## bea19 dev ######################### ... 
07/24/2023 15:03:01 - INFO - root - load bea19 eval data ... 
07/24/2023 15:03:01 - INFO - root - finish loading bea19 eval data ... 
07/24/2023 15:03:01 - INFO - root - finish loading bea19 eval data ... 

  0%|          | 0/3 [00:00<?, ?it/s]07/24/2023 15:03:01 - INFO - __main__ - ***** Running training *****
07/24/2023 15:03:01 - INFO - __main__ -   Num examples = 2
07/24/2023 15:03:01 - INFO - __main__ -   Num Epochs = 3
07/24/2023 15:03:01 - INFO - __main__ -   Instantaneous batch size per device = 16
07/24/2023 15:03:01 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 1024
07/24/2023 15:03:01 - INFO - __main__ -   Gradient Accumulation steps = 32
07/24/2023 15:03:01 - INFO - __main__ -   Total optimization steps = 3
07/24/2023 15:03:01 - INFO - __main__ -   num_beams = 5
07/24/2023 15:03:01 - INFO - __main__ -   generate_max_target_length = 128
07/24/2023 15:03:01 - INFO - __main__ -   generate_min_target_length = 5
07/24/2023 15:03:01 - INFO - __main__ -   learning_rate = 0.0001
07/24/2023 15:03:01 - INFO - __main__ -   use_adafactor = True
07/24/2023 15:03:01 - INFO - __main__ -   use_t5_model = True
07/24/2023 15:03:01 - INFO - __main__ -   t5_source_prefix = translate English to English: 

 33%|███▎      | 1/3 [00:10<00:21, 10.82s/it]
(Epoch 0) LOSS:12.1937:  33%|███▎      | 1/3 [00:10<00:21, 10.82s/it]
(Epoch 0) LOSS:12.1937:  67%|██████▋   | 2/3 [00:12<00:05,  5.38s/it]
(Epoch 1) LOSS:7.6824:  67%|██████▋   | 2/3 [00:12<00:05,  5.38s/it] 
  0%|          | 0/41 [00:00<?, ?it/s]

  0%|          | 0/41 [00:00<?, ?it/s]�[ATraceback (most recent call last):
  File "/home/moongs/workspace/MultimodalGEC/gec_speech_moe_mse/main.py", line 640, in <module>
    main()
  File "/home/moongs/workspace/MultimodalGEC/gec_speech_moe_mse/main.py", line 636, in main
    Trainer.train(train_dataloader, eval_dataloader_conll14, eval_dataloader_conll13, eval_dataloader_bea19, eval_dataloader_bea19_dev)     
  File "/home/moongs/workspace/MultimodalGEC/gec_speech_moe_mse/model/trainer.py", line 197, in train
    self.train_autoregressive(train_dataloader, eval_dataloader_conll14, eval_dataloader_conll13, eval_dataloader_bea19, eval_dataloader_bea19_dev)
  File "/home/moongs/workspace/MultimodalGEC/gec_speech_moe_mse/model/trainer.py", line 323, in train_autoregressive
    self.test_autoregressive(eval_dataloader_conll14, 1, epoch)
  File "/home/moongs/workspace/MultimodalGEC/gec_speech_moe_mse/model/trainer.py", line 371, in test_autoregressive
    generated_tokens = self.accelerator.unwrap_model(self.model).generate(
  File "/home/moongs/miniconda3/envs/py310-mm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/moongs/miniconda3/envs/py310-mm/lib/python3.10/site-packages/transformers/generation/utils.py", line 1620, in generate
    input_ids, model_kwargs = self._expand_inputs_for_generation(
  File "/home/moongs/miniconda3/envs/py310-mm/lib/python3.10/site-packages/transformers/generation/utils.py", line 732, in _expand_inputs_for_generation
    model_kwargs["encoder_outputs"] = _expand_dict_for_generation(model_kwargs["encoder_outputs"])
  File "/home/moongs/miniconda3/envs/py310-mm/lib/python3.10/site-packages/transformers/generation/utils.py", line 721, in _expand_dict_for_generation
    dict_to_expand[key] = dict_to_expand[key].repeat_interleave(expand_size, dim=0)
IndexError: Dimension specified as 0 but tensor has no dimensions

  0%|          | 0/41 [00:02<?, ?it/s]
Traceback (most recent call last):
  File "/home/moongs/workspace/MultimodalGEC/gec_speech_moe_mse/main.py", line 640, in <module>
    main()
  File "/home/moongs/workspace/MultimodalGEC/gec_speech_moe_mse/main.py", line 636, in main
    Trainer.train(train_dataloader, eval_dataloader_conll14, eval_dataloader_conll13, eval_dataloader_bea19, eval_dataloader_bea19_dev)     
  File "/home/moongs/workspace/MultimodalGEC/gec_speech_moe_mse/model/trainer.py", line 197, in train
    self.train_autoregressive(train_dataloader, eval_dataloader_conll14, eval_dataloader_conll13, eval_dataloader_bea19, eval_dataloader_bea19_dev)
  File "/home/moongs/workspace/MultimodalGEC/gec_speech_moe_mse/model/trainer.py", line 323, in train_autoregressive
    self.test_autoregressive(eval_dataloader_conll14, 1, epoch)
  File "/home/moongs/workspace/MultimodalGEC/gec_speech_moe_mse/model/trainer.py", line 371, in test_autoregressive
    generated_tokens = self.accelerator.unwrap_model(self.model).generate(
  File "/home/moongs/miniconda3/envs/py310-mm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/moongs/miniconda3/envs/py310-mm/lib/python3.10/site-packages/transformers/generation/utils.py", line 1620, in generate
    input_ids, model_kwargs = self._expand_inputs_for_generation(
  File "/home/moongs/miniconda3/envs/py310-mm/lib/python3.10/site-packages/transformers/generation/utils.py", line 732, in _expand_inputs_for_generation
    model_kwargs["encoder_outputs"] = _expand_dict_for_generation(model_kwargs["encoder_outputs"])
  File "/home/moongs/miniconda3/envs/py310-mm/lib/python3.10/site-packages/transformers/generation/utils.py", line 721, in _expand_dict_for_generation
    dict_to_expand[key] = dict_to_expand[key].repeat_interleave(expand_size, dim=0)
IndexError: Dimension specified as 0 but tensor has no dimensions

  0%|          | 0/41 [00:02<?, ?it/s]

(Epoch 1) LOSS:7.6824:  67%|██████▋   | 2/3 [00:15<00:07,  7.98s/it]
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 932696) of binary: /home/moongs/miniconda3/envs/py310-mm/bin/python
Traceback (most recent call last):
  File "/home/moongs/miniconda3/envs/py310-mm/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/moongs/miniconda3/envs/py310-mm/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/moongs/miniconda3/envs/py310-mm/lib/python3.10/site-packages/torch/distributed/run.py", line 798, in <module>
    main()
  File "/home/moongs/miniconda3/envs/py310-mm/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/home/moongs/miniconda3/envs/py310-mm/lib/python3.10/site-packages/torch/distributed/run.py", line 794, in main
    run(args)
  File "/home/moongs/miniconda3/envs/py310-mm/lib/python3.10/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/home/moongs/miniconda3/envs/py310-mm/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/moongs/miniconda3/envs/py310-mm/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
gec_speech_moe_mse/main.py FAILED
------------------------------------------------------------
Failures:
[1]:
  time      : 2023-07-24_15:03:23
  host      : twinkle1
  rank      : 1 (local_rank: 1)
  exitcode  : 1 (pid: 932697)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-07-24_15:03:23
  host      : twinkle1
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 932696)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
Traceback (most recent call last):
  File "/home/moongs/workspace/MultimodalGEC/tool/spacy_en_tok.py", line 16, in <module>
    token(sys.argv[1], sys.argv[2])
  File "/home/moongs/workspace/MultimodalGEC/tool/spacy_en_tok.py", line 7, in token
    for line in open(inputfile, 'r'):
FileNotFoundError: [Errno 2] No such file or directory: '/home/moongs/workspace/MultimodalGEC/data-english/model-result/result/dot-attention_bs-16x32_lr-0.0001-2a100-moe-mse/seed333_lr0.0001-conll14.best.candidate'
python: can't open file '/home/moongs/workspace/MultimodalGEC/tool/retokenizer_en.py': [Errno 2] No such file or directory
Traceback (most recent call last):
  File "tool/scripts/m2scorer.py", line 137, in <module>
    p, r, f1 = levenshtein.batch_multi_pre_rec_f1(system_sentences, source_sentences, gold_edits, max_unchanged_words, beta, ignore_whitespace_casing, verbose, very_verbose)
  File "/home/moongs/workspace/MultimodalGEC/tool/scripts/levenshtein.py", line 103, in batch_multi_pre_rec_f1
    assert len(candidates) == len(sources) == len(gold_edits)
AssertionError
  1. Also, I have noticed that some of the required dependencies, such as soundfile, transformers, spacy, have all not been mentioned in the README file. Would you provide the full list of dependencies, along with the version information, required to reproduce the experiment results?
  2. Is it possible to use train_and_eval.sh to train and evaluate for German dataset? I have looked at train_and_eval.sh and gec_speech_moe_mse/main.py and they both seemed to be specific to English dataset. If it is not possible to run these code on German dataset, would it be possible to upload the code used to train and evaluate for German dataset?

Any help would be greatly appreciated. Thank you!

@fangtao-123
Copy link
Collaborator

Hi, thanks for your interest.

  1. Based on your log files, it appears that the code can run smoothly, and I haven't encountered this type of error before. However, you may want to check your training and testing data to ensure that the audio files correspond one-to-one with the text data. Additionally, make sure that the generated speech does not contain any empty audio segments. You are not necessarily required to use our provided speech model for speech generation. In fact, you might consider utilizing Google's text-to-speech API to produce speech in MP3 format, which would result in lower memory usage.
  2. We update some of the key dependency packages along with their versions. Please refer to the 'requirement.txt' file for more details.
  3. Yes, but you will need to make some simple modifications to the code. We provide the code for training in German, which update in the repository.

@gsmoon97
Copy link
Author

gsmoon97 commented Jul 27, 2023

Thank you for the update!

I have one more question. While trying to reproduce the results for German data, I encountered the ValueError: Reference is empty. error from line 408. I have tried to suppress the error by providing an ignore_empty=True to rouge.get_scores() function as well, but I still encounter the same error. I am guessing that it might be caused by rouge not being able to handle strings (gold from line 380) only consisting of dots(.) as stated here. May I know if you have encountered the same error previously and if you have, how have you handled the error?

For your reference, I am currently using rouge==1.0.1 and rouge-score==0.1.2 as stated in the provided requirements.txt and I have used the German dataset released in this repo. I have also trained and evaluated with num_train_epochs of 1 and eval_epoch of 1 to test out the code first.

Thank you for your help.

@fangtao-123
Copy link
Collaborator

I suspect that insufficient training might have caused the issue with text generation. I re-ran the model on my end and didn't encounter the error you mentioned. To address this, you may want to consider increasing the number of training epochs, for example, setting it to 3 or 5. Additionally, for the GEC task, the ROUGE metric is optional and can be removed directly from the source code (specifically, lines 456 to 465 and 407 to 416 in trainer.py). This is because the ROUGE metric is not necessary for the GEC task.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants