Skip to content

clarinsi/CLARINprojekt2024-koreferencnost

 
 

Repository files navigation

This repository serves as a code archive for the deliverables produced within the CLARIN.SI 2024 project "Implementacija podpore za razširjeno uporabo slovenskih virov za odkrivanje koreferenčnosti". The aim of the project was to enable easier use and broader recognition of Slovene coreference data:

  • via a convenient datasets library data loading implementation;
  • via a conversion into the universal CorefUD data format;
  • via a unified benchmarking implementation within the SloBENCH evaluation framework.

Contents

Conversion_UDCoref

The folder contains scripts for converting the coref149 and SentiCoref corpora from their original formats into the CorefUD CoNLL-U format. For the scripts to work, the following data in raw format needs to be placed within the folder:

Afterward, the corresponding scripts (convert_coref149.py, convert_senticoref.py, convert_senticoref_private.py) can be run successfully.

Benchmarking_SloBENCH

The folder contains implementation of coreference resolution evaluation within the SloBENCH evaluation framework for the coref149 and SentiCoref corpora. The code is here for an archival purpose, and the following pull request (and the repository) should be observed for a completely up to date version: clarinsi/slobench-eval-docker#3.

DataLoaders_HuggingFace

The folder contains scripts that support user-friendly data loading of the coref149 and SentiCoref corpora within the HuggingFace datasets library. Additionally, the implementation places the two resources on an international portal, potentially giving it more recognition. The code is here for an archival purpose, and the scripts on the HuggingFace portal should be observed for an up-to-date version: cjvt/senticoref, cjvt/coref149.

CLARIN.SI logo

The code contained here was produced within the CLARIN.SI 2024 project "Implementacija podpore za razširjeno uporabo slovenskih virov za odkrivanje koreferenčnosti".

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.7%
  • Dockerfile 0.3%