Repository of team TripleM for NLP @ UL FRI. Project 3:Cross-lingual sense disambiguation
Students: Matej Miočić, Marko Ivanovski, Matej Kalc
- Reports and additional material for showing our work is present in folder
Reports
. - Source code with jupyter notebooks are in
src
which is divided into:corpus
: code for obtaining corpus; See demo of corpus herepretrained models
: some pre-trained models which we were able to upload on githubresults
: a bunch of results used for analysis on non-contextual embeddingsweb scrapper
: the scrapper for obtaining homonyms from fran.sinon contextual embeddings
andcontextual embeddings
contain code for the mentioned embeddings clusters, scores, analysisclassification
for classifying with clusters and annotated sentences
Before running the code please install the requirments in requirements.txt
. If you experience any problems with downloading AllenNLP run the following command:
pip3 install allennlp
-
In order work with our corpus you need to download it from here.
-
To work with Word2Vec you need to download the embeddings from here.
-
To work with fastText you need to download the embeddings from here.
-
ELMo pretrained weights can be found here.
-
Finally, sloBERTa can be download from here.
Click here to open the report.