Quick start and setup #31

DelaramRajaei · 2023-07-10T17:19:03Z

@saragebara
This is an issue page to log your progress. Please let us know if you have any concerns or questions.

DelaramRajaei · 2023-07-10T18:19:32Z

Please read the IR and backtranslation document I shared with you by Monday. The document is not fully completed in the metric part. You can find some helpful information in this link.

I found some papers on backtranslation and you can find their links in this doc under BackTranslation header.
There are two surveys about data augmentation in this document. I recommend reading these two. They'll give you a good overview of the topic of data augmentation and backtranslation. Their summary is also available.

Please let us know if you have any concerns or questions.

DelaramRajaei · 2023-07-13T18:01:56Z

@saragebara

Hello Sara,
I hope everything is going well for you. Could you please provide a log detailing the tasks you have completed?

saragebara · 2023-07-14T03:48:47Z

Hi Miss,
I hope you are well too!
So far, I have read through both documents and the first survey. I will be finished the second survey by tomorrow! 😄

DelaramRajaei · 2023-07-14T17:01:35Z

Call me Delaram :)
Great.
Please ensure that you consistently (2 - 3 days) keep a log of your progress.
Once you've completed the task, let me know so we can move forward to the next one.

saragebara · 2023-07-15T02:45:18Z

Okay, will do 😸
I've finished reading everything and I'm ready to start the next task!

DelaramRajaei · 2023-07-17T17:32:11Z

Great.
For your next task please search for another translation model that specifically focuses on translating English to a maximum of three or four languages. It's important that the chosen model is easy to implement and can be used within a pipeline.

Currently, we are working with the facebook/nllb-200-distilled-600M model.

Here is the implementation:

After finding the suitable model, install the ReQue project and add it to the project.

For installation, you can use readme and this issue.

saragebara · 2023-07-21T06:21:24Z

Hi Delaram,

Here are some models that I have found and a summary for each one. I am still working on finding a better-suited model. It is difficult to find many models focusing only on 3-4 languages, so I’ve included the models I’ve found with English + 1 language as well. These could be used in conjunction with each other, though I believe that would not work with a pipeline.

EN-FR-RUM-DE: tftransformers/t5-small
4 languages. Trained using C4. This dataset includes dialogue, general statements/facts, and questions. There are 5 versions of T5; the one linked is the base version.

Here are the size variants. Generally, the larger models yield better results.
This seems to be the best suited model based on my research thus far.

EN-FR: marian-finetuned-kde4-en-to-fr
A fine tuned version of the Helsinki-NLP/opus-mt-en-fr (marian). It has a good BLEU score of 52.9416, which means it could yield good results. However, it was finetuned using this dataset. A lot of this dataset consists of instructions/menu interface labels (for example, volume settings, rewind/play/pause, etc).

EN-PT: m2m100_418M-finetuned-kde4-en-to-pt_BR
Same as the last model, except it's English and Portuguese. I've included it since it yielded a better BLEU score of 58.3196.

EN-romance languages: Helsinki-NLP/opus-mt-en-roa
This model includes 16 languages. Although it exceeds the maximum of 3-4 languages, it is a bit more focused than the current model, and may be useful. The dataset it is trained on mostly consists of dialogue statements. It has good BLEU scores for French, Portuguese, Papiamento, and Galician.

DelaramRajaei · 2023-07-21T16:43:18Z

Hi Sara,

Thanks for the summary! I think these models should do the trick. It's just an experiment to see if changing the model might make the results even better.

Can you install the ReQue project and test it with these models?

saragebara · 2023-07-24T06:10:17Z

Hi Delaram,

Sorry for the delay. I've been running into a few errors while installing the project. I am currently at step 3, installing anserini, which I am figuring out.
Regarding step 2, installing the requirements.txt, I keep getting the following error:

Nothing I've tried so far has seemed to work (re-installing, ensuring everything is up-to-date, reverting to python 3.8, etc.). I have more possible solutions I am planning to try out next, so I will provide an update in my next log. If you have any ideas for fixing this, anything is appreciated.

Additionally, I'm currently using PyCharm to install everything. Is this best, or would it be better to use something else?

Thank you,
Sara

DelaramRajaei · 2023-07-24T14:07:25Z

Hi Sara,

It appears that there is an issue while installing one of the libraries. Could you please specify which library is causing the error?

Additionally, have you considered using the environment.yaml file as an alternative method for installing the libraries? If so, is there any problem encountered with that approach as well?

No, the installation process should not vary based on the platform. I'm also using PyCharm for development.

If the issue persists and you're unable to resolve it, we can meet at the lab to work on it together and find a solution.

saragebara · 2023-07-28T20:41:20Z

Hi Delaram,

I believe this is the library causing the error when installing requirements.txt:

As for environment.yml, I've tried installing it through that method, but I keep getting the following:

I'm also getting stuck in the installation of anserini. The command "cd ndeval && make" does not work.

The last command also used "&&", but it worked when I replaced it with "-and". This doesn't work for this command though, and I haven't found any other fixes

I've been trying to troubleshoot for a while, but I'm still having issues. I'm wondering if it would be possible for me to work on this in the lab so that if anything arises that I can't find a fix for, I could ask. However, I also don't have access to the university's internet, so I'm not sure how that would work.

I am very sorry for the delay, it's my first time really using the terminal or going through an installation like this, so it's been a bit confusing. I would really appreciate if we could meet up at any point to find a solution.

Thank you!

DelaramRajaei · 2023-07-31T17:45:47Z

@saragebara
Hi Sara,

Apologies for the delayed reply. Certainly, you can come to the lab on Wednesday. Let's collaborate and tackle the bug together. No need to worry, we'll figure it out as a team.

saragebara · 2023-07-31T19:14:46Z

Sounds great, thank you so much! 😊

I managed to fix the errors in installing the requirements.txt.
I had to downgrade python and a few of the installed libraries, as well as ensure that Python was 64 bit.

I should hopefully have more fixed by Wednesday as well! I am now working on installing anserini properly, I will provide an update if I manage to successfully install it.

saragebara · 2023-08-03T03:22:57Z

ReQue has been successfully installed and run.

Here is a log of today's process:

Created and activated a new environment

$ conda create -n ReQue
$ conda activate ReQue

Ran the following command:
$ conda install python=3.8 -n ReQue
Changed the Python interpreter in PyCharm to the python.exe in the conda ReQue environment
Python Interpreter --> Add New Interpreter --> Add Local Interpreter --> Conda Environment --> Load Environments
From there, the path of the Conda Executable was changed by finding the envs folder --> ReQue --> python.exe and applying the new interpreter.
If successful, "(ReQue)" appears when opening the terminal.
Installed anserini following the instructions provided in this link. Note: Cygwin terminal had to be used in order to make the files. Afterwards, anserini was built (Apache Netbeans IDE 15 was used).
Installed pyserini following these instructions, but ignoring the first step of creating a new environment. The pip installation was used and encountered no errors, while the developer installation did encounter errors.
Installed dataset (robust04) and extracted it into ..\ReQue\ds
Attempt to run but encounter an error regarding nltk. Run the following to fix:

    $ python
  
    >>> import nltk

    >>> nltk.download("stopwords")

    >>> nltk.download("punkt")

Encountered another error regarding sentence_transformers. Fixed by running:
pip install sentence_transformers

Afterwards, the project was run successfully.

hosseinfani · 2023-08-03T10:13:47Z

@DelaramRajaei
would you please fix the errors in installation procedure like adding sentence_transformers to requirement.txt and environment.yml . thanks.

DelaramRajaei · 2023-08-04T12:55:02Z

@saragebara
Hi Sara,

Thank you for providing the detailed log. Have you run the project for all the commands (generate, search, evaluation, and build)? For your next task, please attempt to add the models you discovered earlier into the project and run it on them.

saragebara · 2023-08-04T14:43:47Z

Hi Delaram,

Yes, I have run the project for all the commands without any errors. 😄

Sounds good, I will get started on testing the models right now.

saragebara · 2023-08-08T03:20:44Z

Hi Delaram,

I have been running into issues for all models tested. In all 4 cases, the output would only present the translated output, but not the backtranslated output. I haven't been able to find a fix, and I'm not sure if it's because the models can only support English as a source language.

Additionally, I'm running into an error for the models with multiple languages. Rather than just translating to the target language(s), they translate to all available languages, alternating between each other in one big output.
Ex)

I'm working on either finding a different model or finding a solution for the current issue. Do you know what might be causing this?

Thank you!

DelaramRajaei · 2023-08-08T23:05:19Z

Hello Sara,

You might be right. This error may be caused by the translation model only supporting English.
Remember that you are not using a universal language so the selected model does not support all the 10 languages I add in the param.py file. Just run the program for a specific language like Farsi or French that the model supports (You can also add different languages in addition to the one I picked initially. Just make sure to include one or two that are already there, so we can compare the results). Also, make sure that the model supports those target languages as a source language for the second part. (Although they might already. But check their documentation to make sure. )

I suggest you put a breakpoint where the backtranslated is being generated and find the error.

DelaramRajaei · 2023-08-12T03:10:14Z

@saragebara

Hello Sara,

Just wanted to ask if you were able to fix the issue or not.

saragebara · 2023-08-14T05:26:26Z

Hi Delaram,

The documentation for the models I tested only supports English as the source language, which I believe to be the cause of the error. I have been working on finding different models, though I haven't found any other models which support 3-4 languages (including an already used language so that comparison is possible).

I've tried testing other models, but each time they've either generated nothing or translation errors occurred.
Here are the ones I've tried so far:

https://huggingface.co/Babelscape/mrebel-large, https://huggingface.co/facebook/mbart-large-cc25: Both ran into an error where they would translate the first word (International) over and over again for 7125 characters. The second model did back translate properly though, but it would only output the first word.
https://huggingface.co/facebook/wmt21-dense-24-wide-x-en, https://huggingface.co/alirezamsh/small100: Would not output anything after the regular output.

The program is being run to translate just to a single language at the moment (French) (The previous models were tested with only French as well). All the models above support all their languages as source languages. Do I have to add additional code for some models to work? Additionally, I've tried to add a breakpoint but I'm not sure where to put it, and the places I've tried have not allowed me to run the project/debug.

I'm very sorry for the delay, this week has been very busy. Next week will not be as hectic, so I will have more time. My internet is quite slow so it takes a long time to get an output.

Thank you for the help!

hosseinfani · 2023-08-14T14:44:07Z

@DelaramRajaei and @saragebara
Please create a table and list all the models that you try for the task of translation and backtranslation with their issues. So, later we can justify why we use nllb. Thank you.

DelaramRajaei · 2023-09-06T16:59:27Z

@saragebara
Hello Sara,
Have you included the models and verified their functionality?

DelaramRajaei added the good first issue Good for newcomers label Jul 10, 2023

DelaramRajaei assigned saragebara Jul 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quick start and setup #31

Quick start and setup #31

DelaramRajaei commented Jul 10, 2023

DelaramRajaei commented Jul 10, 2023

DelaramRajaei commented Jul 13, 2023 •

edited

Loading

saragebara commented Jul 14, 2023

DelaramRajaei commented Jul 14, 2023

saragebara commented Jul 15, 2023

DelaramRajaei commented Jul 17, 2023

saragebara commented Jul 21, 2023

DelaramRajaei commented Jul 21, 2023 •

edited

Loading

saragebara commented Jul 24, 2023

DelaramRajaei commented Jul 24, 2023

saragebara commented Jul 28, 2023

DelaramRajaei commented Jul 31, 2023

saragebara commented Jul 31, 2023

saragebara commented Aug 3, 2023

hosseinfani commented Aug 3, 2023

DelaramRajaei commented Aug 4, 2023

saragebara commented Aug 4, 2023

saragebara commented Aug 8, 2023

DelaramRajaei commented Aug 8, 2023

DelaramRajaei commented Aug 12, 2023 •

edited

Loading

saragebara commented Aug 14, 2023

hosseinfani commented Aug 14, 2023

DelaramRajaei commented Sep 6, 2023

Quick start and setup #31

Quick start and setup #31

Comments

DelaramRajaei commented Jul 10, 2023

DelaramRajaei commented Jul 10, 2023

DelaramRajaei commented Jul 13, 2023 • edited Loading

saragebara commented Jul 14, 2023

DelaramRajaei commented Jul 14, 2023

saragebara commented Jul 15, 2023

DelaramRajaei commented Jul 17, 2023

saragebara commented Jul 21, 2023

DelaramRajaei commented Jul 21, 2023 • edited Loading

saragebara commented Jul 24, 2023

DelaramRajaei commented Jul 24, 2023

saragebara commented Jul 28, 2023

DelaramRajaei commented Jul 31, 2023

saragebara commented Jul 31, 2023

saragebara commented Aug 3, 2023

hosseinfani commented Aug 3, 2023

DelaramRajaei commented Aug 4, 2023

saragebara commented Aug 4, 2023

saragebara commented Aug 8, 2023

DelaramRajaei commented Aug 8, 2023

DelaramRajaei commented Aug 12, 2023 • edited Loading

saragebara commented Aug 14, 2023

hosseinfani commented Aug 14, 2023

DelaramRajaei commented Sep 6, 2023

DelaramRajaei commented Jul 13, 2023 •

edited

Loading

DelaramRajaei commented Jul 21, 2023 •

edited

Loading

DelaramRajaei commented Aug 12, 2023 •

edited

Loading