Skip to content

Latest commit

 

History

History
171 lines (151 loc) · 25.1 KB

README.md

File metadata and controls

171 lines (151 loc) · 25.1 KB

Awesome Danish

A curated list of awesome resources for Danish language technology

Data

Corpora

Parallel corpora

Spoken language corpora

Dictionaries and ontologies

Word sets

  • Danish-Similarity-Dataset - Similarity scores for 99 Danish word pairs by Nina Schneidermann and Bolette Sandford Pedersen. Also available in danlp.
  • Wordsim353-da - Danish translation by Finn Årup Nielsen of the English Wordsim353 English word pair set. Also available in danlp.
  • Four words - 100 odd-one-out sets of 4 words or phrases.

Embeddings

Neural text models

Neural speech models

Tools

Lemmatization

  • Lemmy - Lemmatizer for Danish in Python.
  • cstlemma - lemmatiser.
  • spaCy - Python-based package with lemmatization.

Punctuation

  • punctfix - "Adds punctuation and capitalization for a given text."

Named entity recognition

Entity linking

  • Babelfy - Web app and service for linking words and entities.
  • DBpedia Spotlight - DBpedia-based entity linker. Described in Improving Efficiency and Accuracy in Multilingual Entity Extraction (Scholia)

Sentiment analysis

Automatic Speech Recognition

  • danspeech - DeepSpeech2-based Danish speech recognition in Python
  • kaldi-sprakbanken - A recipe for training state-of-the-art(2017) speech recogniser for Danish based on the 16kHz NST database.

Speech Synthesis (text-to-speech)

  • espeak - An open-source speech synthesis program for ~56 languages including Danish. eSpeak can also be used as a grapheme-to-phoneme converter and was used to create the Danish Kaldi recipe.
  • ResponsiveVoice - Commercial Web-based (Javascript-based) text-to-speech synthesis for a number of languages, including Danish. The commercial service is currently free for limited and non-commercial use.
  • Google Cloud Text-to-Speech - Commercial Web-based text-to-speech synthesis for a number of languages, including Danish.
  • Amazon Polly - Commercial Web-based text-to-speech synthesis for a number of languages, including Danish. Part of Amazon's commercial AWS services. Female and male voices are available as examples. Limited unregistered free service available at TTSMP3.

Fundamental processing

  • DaNLP - "a repository for Natural Language Processing resources for the Danish Language."
  • dapipe - Danish UD-pipe: tokenisation, lemmatisation, PoS tagging, morphology, dependencies.
  • UDPipe - Non-language specific version of dapipe. Newer version of the Danish-DDT model than that which is offered by dapipe is available at https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-2998
  • DKIE - GATE pipeline including wrapped Danish models for Stanford CoreNLP.
  • StanfordNLP. Python software package for dependency parsing, including tokenization, lemmatization and part-of-speech tagging. A pre-trained model for Danish is available.
  • bornholmsk - Datasets and embeddings for the Bornholmsk dialect.
  • spaCy - Python-based natural language processing package
  • dacy - Danish spaCy pipeline.

Competitions

Benchmarks

  • Danoliterate - Overview of the performance of language models on a range of individual benchmarks.

Resources about resources