This repository contains code and datasets for the study "Preprints in motion: tracking changes between posting and publication" available from bioRxv.
This README contains an overview of the different files and datasets contained in this repository. The full methodology is documented in the above linked preprint.
-
compute_abstract_changeratio.ipynb
: This notebook computes the changeratio between two abstract versions (input file:all_pairs.tsv
in thedata
folder), using either the Python librarydifflib
or the output of Microsoft Word track changes (available here). It produces a change-ratio score for each abstract. We used this in the preliminary phase of our study to determine which track change algorithm was more suited to our work. -
extract_annotations.ipynb
: given an input file of reconciled annotations derived from Microsoft Word, this notebook structures them in a .tsv identifying for each annotation the annotator, the label, related comments etc. -
structure_annotations_and_scores.ipynb
: Finally, this notebook uses the tab-sepatated file as an input for generating a complete overview of each annotation.