Social Science Software Citation Dataset

Introduction

The SoSciSoCi corpus was created as a training corpus for the identification of software usage statements in social science publications. The corpus consists of the methods sections from 480 randomly chosen articles from PLoS, which contain the keyword "Social Science". The corpus was created in a joint effort of David Schindler, Benjamin Zapilko and Frank Krüger.

Annotation Procedure

Objective was to mark all usage statements of software within the scientific publications. It was assumed that the number of software mentions is low in comparison to other domains. For this purpose, we added XXX sentences that contain software usage statements as positive samples. The sentences were shuffled for the annotation, to restrict the reasoning of annotators to the context of the sentence. Additional information such as version or manufacturer was explicitly excluded from the efforts. Annotation was performed by seven different annotators (two students from University of Rostock, two students from Gesis, Benjamin Zapilko from Gesis and David Schindler and Frank Krüger from University of Rostock).

Annotation Quality

To determine the quality of the annotation procedure, 10% of sentences were annotated by two annotators. The computed IRR is a Cohen's kappa of 0.816.

Corpus usage

The Corpus was used in the following publication:

David Schindler and Benjamin Zapilko and Frank Krüger: Investigating Software Usage in the Social Sciences: A Knowledge Graph Approach, In Proceedings of the 17th Extended Semantic Web Conference, Heraklion, Crete, Greece, May 31 - June 4 2020

Please cite this publication, when using the corpus.

Shield:

This work is licensed under a Creative Commons Attribution 4.0 International License.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Data		Data
LICENCE		LICENCE
Readme.md		Readme.md
positive_samples.csv		positive_samples.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Social Science Software Citation Dataset

Introduction

Annotation Procedure

Annotation Quality

Corpus usage

About

Releases

Packages

License

f-krueger/SoSciSoCi

Folders and files

Latest commit

History

Repository files navigation

Social Science Software Citation Dataset

Introduction

Annotation Procedure

Annotation Quality

Corpus usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages