Skip to content

f-krueger/SoSciSoCi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Social Science Software Citation Dataset

Introduction

The SoSciSoCi corpus was created as a training corpus for the identification of software usage statements in social science publications. The corpus consists of the methods sections from 480 randomly chosen articles from PLoS, which contain the keyword "Social Science". The corpus was created in a joint effort of David Schindler, Benjamin Zapilko and Frank Krüger.

Annotation Procedure

Objective was to mark all usage statements of software within the scientific publications. It was assumed that the number of software mentions is low in comparison to other domains. For this purpose, we added XXX sentences that contain software usage statements as positive samples. The sentences were shuffled for the annotation, to restrict the reasoning of annotators to the context of the sentence. Additional information such as version or manufacturer was explicitly excluded from the efforts. Annotation was performed by seven different annotators (two students from University of Rostock, two students from Gesis, Benjamin Zapilko from Gesis and David Schindler and Frank Krüger from University of Rostock).

Annotation Quality

To determine the quality of the annotation procedure, 10% of sentences were annotated by two annotators. The computed IRR is a Cohen's kappa of 0.816.

Corpus usage

The Corpus was used in the following publication:

David Schindler and Benjamin Zapilko and Frank Krüger: Investigating Software Usage in the Social Sciences: A Knowledge Graph Approach, In Proceedings of the 17th Extended Semantic Web Conference, Heraklion, Crete, Greece, May 31 - June 4 2020

Please cite this publication, when using the corpus.

Shield: CC BY 4.0

This work is licensed under a Creative Commons Attribution 4.0 International License.

CC BY 4.0

About

Social Science Software Citations

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published