Skip to content

Repository of team TripleM for NLP @ UL FRI. Project 3:Cross-lingual sense disambiguation

License

Notifications You must be signed in to change notification settings

KalcMatej99/NLP-tripleM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NLP-tripleM

Repository of team TripleM for NLP @ UL FRI. Project 3:Cross-lingual sense disambiguation

Students: Matej Miočić, Marko Ivanovski, Matej Kalc

Folder structure

  • Reports and additional material for showing our work is present in folder Reports.
  • Source code with jupyter notebooks are in src which is divided into:
    • corpus: code for obtaining corpus; See demo of corpus here
    • pretrained models: some pre-trained models which we were able to upload on github
    • results: a bunch of results used for analysis on non-contextual embeddings
    • web scrapper: the scrapper for obtaining homonyms from fran.si
    • non contextual embeddings and contextual embeddings contain code for the mentioned embeddings clusters, scores, analysis
    • classification for classifying with clusters and annotated sentences

Before running the code

Before running the code please install the requirments in requirements.txt. If you experience any problems with downloading AllenNLP run the following command:

pip3 install allennlp

Downloading models and corpus

  1. In order work with our corpus you need to download it from here.

  2. To work with Word2Vec you need to download the embeddings from here.

  3. To work with fastText you need to download the embeddings from here.

  4. ELMo pretrained weights can be found here.

  5. Finally, sloBERTa can be download from here.

Report

Click here to open the report.

About

Repository of team TripleM for NLP @ UL FRI. Project 3:Cross-lingual sense disambiguation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published