-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpeNTF via NMT (OpenNMT) #243
Comments
Hi @thangk
Opennmt only gives you the translation metrics like ppl, as seen in the image.
@jamil2388 please advise |
@hosseinfani, @thangk for now, I am putting a doc link here. This contains almost all sets of arguments used for onmt pipeline. https://community.libretranslate.com/t/documentation-for-opennmt-py-parameters/927/ I think looking into this argument in the page might help us for prediction file dumping : Also I advice Kap to learn about the behavior of the translation metrics used in the current runs. Because it will help crucially in understanding the model train and test behavior, eventually letting us know the direction of adjustments. Thanks! |
@jamil2388 thanks. @thangk one more thing. when exploring hyperparameters, also see how you can use openmt for different type of translators. Because, we need to study the effect of translation for our work. These translators should be published in a paper such that we can cite them in the paper. I think openmt community update their codeline to include more and more new translators, which helps you for our task (this is like @jamil2388 using different gnn methods from pyg for team formation). |
Hi @hosseinfani, continuing conversation from whether or not to average all the folds' eval metrics to get one set of data for each epoch setting (ie. 500, 1000) I was referring to these. Each fold produces its own eval metrics. There's one more, fold2, below fold1, which isn't visible in the screenshot. I am thinking the right approach is to average the e500 and e1000 pairs across all 3 folds to put in the excel. |
I saw some charts we've used in some papers, and I can see those papers use the average of the folds. I'll follow the same approach. |
Hi @thangk now I see. There should be another file with no fold-idx, like test.epoch* that include the average of folds. but you're right about average of folds |
@thangk my preference is to keep the progress logs like this issue, rather than chats in teams or else where. |
Yesterday, I ran three (Transformer, ConvS2S, RNN with attention) seq2seq-based models on the dblp (filtered) dataset and out of the three, only two (ConvS2S and RNN with attention) ran successfully with the baseline configs I've set. Here are the first run results for ConvS2S (left) and RNN with attention (right) It seems there are issues with the shape of the input in the transformer model. I'll dig into the issue. |
This was the first run of all datasets using the ConvS2S model. Hyperparameters:
|
@thangk |
I was thinking of putting the best results from Jamil's FNN and BNN. Do you want me to put the pure FNN and BNN from Rad et al's paper? |
yes, I believe Jamil has reproduced the results already. |
yeah, he has the results for imdb and dblp. I'm gathering them for these tables. |
This is what I currently have for imdb. I am working on dblp now. The transformer model isn't working quite right as it needs some more debugging. dblp
imdb
hyperparameters for
Edit: dblp results added. |
how would I say this if I don't yet see a substantial performance improvement over the others yet? since it's a first for team formation, I'm not sure I even have the best settings yet. I've tried a few but they aren't still not as good as the fnn or bnn values. Can I say it has potential to be a viable option for team formation tasks yet needs further research? Hi @thangk here is my reply: regarding the low performance of seq-2-seq, you need to know that these models can map a sentence to another one, that is the input space and output space are of size a language tokens (~100k), while keeping order between token. If they're not performing well, we need to find why? then how to change/customize them for our problem? For imdb, it makes sense becuase the input space is just 20-30 words, that should be mapped to a large output space. So, we can say the sparcity of source sequence/language. What else? |
I see. I've also added the pure bnn, bnn_emb and rrn from other papers as the baselines. After doing this, my results aren't too far off, some are even better than the baselines. So, this validates my statement in the abstract. Still, I'm eager to find more optimized hyperparameters and will do so. In the meantime, I'll keep these data and work more on the write-up. I'm also running the gith and uspt on both consvs2s and rnn with the same hyperparameters. |
I noticed that we hardcoded the checkpoints to 500 in the It was using a lot of space I'll delete this right away as soon as it's finished training. I've calculated how much more it'll take, and we have enough space to complete this training. |
I was able to run the Transformer model with the following settings:
And here's the result compared to the others: |
We need to make the comparison between the nmt models themselves and the bnn and fnn models. So, please do:
This way, we argue that although we run the nmt models using more layers or epochs, and it may put them in an advantage compared to the bnn and fnn, however, the bnn and fnn cannot even accept such privilege of more epoch or layer for the same running time/memory. @thangk let me know if you need more clarification. |
Okay, I will redo the models with comparable settings as the FNN and BNN's. Apparantly the models I've posted in the tables are done with I'll update the table again shortly. |
I was able to run the Transformer model as apart of one of this week's task, finding one more architecture to include in the comparisons. The following results were ran 2-3 days ago (before we had the discussion about making as many settings same/similar as possible), that's why the settings aren't close. But it's to show, I was able to run one more model. I'll adjust the settings to be as close (and reasonably) as I can for future comparisons. Note: also the epochs values seem strange because apparently, OpenNMT-py uses "steps" to determine the cycles instead of epochs. So, I realized this after these tests and I used the following formular to convert from steps to epochs which is why the strange epoch values. I'll address this better in future tests. Formula for steps to epochs:
The CSV files are available in the The hyperparameter settings used for the above: Transformer gith1
Transformer imdb1
Thes settings are slightly different to accomondate the difference in dataset (i.e., larger dataset requires more train steps). Again, this is to show a 3rd model is available to for future comparisons. I'll continue tweaking the settings to bring them as close as possible to the baselines we'd be comparing. |
@thangk |
Update The followings are current latest results for the 3 models (t-teamrec, c-teamrec, r-teamrec) at same settings. I'm working on finding the best settings for each of the model on each dataset and to be finished within the coming weeks (will update with further information). I'll do the same for two new models which I am also lookning to add to the research. |
@thangk thank you. Few notes:
|
Tested dataset
data/preprocessed/dblp/dblp.v12.json.filtered.mt75.ts3
Input type
Sparse matrix
Command used
python -u main.py -data ../data/preprocessed/dblp/dblp.v12.json.filtered.mt75.ts3 -domain dblp -model nmt
Observations
The script ran through all 3 folds and produced results without errors, no predictions.
Next step(s)
The text was updated successfully, but these errors were encountered: