Skip to content

MiMoText/mmt_2020-11-19_11-38

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 

Repository files navigation

DOI

mmt_2020-11-19_11-38

Topic Model of roman18 corpus (Nov 2020)

This repository contains the results, scripts and input files for a topic modeling performed in the context of Mining and Modeling Text, a project which is located at the Trier Center for Digital Humanities (TCDH) at Trier University.

It was created on November 19, 2020.

Parameters

Results

Example Topic 7: Wordle Topic 7

Derivation of statements for the MiMoTextBase

Explanation

The resulting Topic Model consists of a predefined number of Topics consisting of a probability distribution of the input words and a probability distribution of these Topics for each text document of the corpus. Based on the most likely words, a label is assigned to each topic. Together with this information, topic statements are finally derived from the distribution of top topics per injected work. We consider the five most likely Topics for each novel, with prior sorting out of all Topics contained in less than 10% and in more than 80% of the corpus works. In this way, very rare, partly work-specific, and very frequent, usually generic, topics are excluded, since they are of no use for a cross-work topic comparison. This leaves 25 topics that are included in the generation of topic statements.

It should be noted that basically every topic is present in every work. However, it only appears significantly above a certain probability, above which we speak in simplified terms of it being present in a work. The threshold value depends on the corpus size and number of topics. For the topic model described here, we have used a probability of 0.03 as the threshold value. With the help of this, we can calculate the percentage of texts in which each topic occurs.

Licence

All texts, here used as input files, are in the public domain and can be reused without restrictions. We don’t claim any copyright or other rights on the transcription, markup, metadata or scripts. If you use our data and scripts, for example in research or teaching, please reference this collection using the citation suggestion below.

Citation suggestion

Topic Model of roman18 corpus (Nov 2020), edited by Anne Klee and Julia Röttgermann. Release v0.1.0. Trier: TCDH, 2021. URL: https://github.com/MiMoText/mmt_2020-11-19_11-38. DOI: 10.5281/zenodo.4493224.