Skip to content

Commit

Permalink
Initial commit for paper on guidelines
Browse files Browse the repository at this point in the history
Co-authored-by: Jonathan de Bruin <[email protected]>
Co-authored-by: Rens van de schoot <[email protected]>
Co-authored-by: Sofie vd Brand <[email protected]>
Co-authored-by: ingeborgvandusseldorp <[email protected]>
Co-authored-by: wouterharmsen <[email protected]>
  • Loading branch information
6 people committed Jun 24, 2021
0 parents commit 174aed0
Show file tree
Hide file tree
Showing 298 changed files with 75,474 additions and 0 deletions.
195 changes: 195 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,195 @@
#raw_data/*.csv
jobs.sh
post-processing/*.R
post-processing/data/*
*.h5

# Created by https://www.toptal.com/developers/gitignore/api/python,r
# Edit at https://www.toptal.com/developers/gitignore?templates=python,r

### Python ###
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
pytestdebug.log

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/
doc/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
.python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
pythonenv*

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# profiling data
.prof

### R ###
# History files
.Rhistory
.Rapp.history

# Session Data files
.RData

# User-specific files
.Ruserdata

# Example code in package build process
*-Ex.R

# Output files from R CMD build
/*.tar.gz

# Output files from R CMD check
/*.Rcheck/

# RStudio files
.Rproj.user/

# produced vignettes
vignettes/*.html
vignettes/*.pdf

# OAuth2 token, see https://github.com/hadley/httr/releases/tag/v0.3
.httr-oauth

# knitr and R markdown default cache directories
*_cache/
/cache/

# Temporary files created by R markdown
*.utf8.md
*.knit.md

# R Environment Variables
.Renviron

### R.Bookdown Stack ###
# R package: bookdown caching files
/*_files/

# End of https://www.toptal.com/developers/gitignore/api/python,r
61 changes: 61 additions & 0 deletions .zenodo.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
{ "description":"This repository contains scripts to run simulations for 14 datasets openly published on the Dutch database for medical guidelines belonging to the paper Artificial intelligence supports literature screening in medical guideline development: towards up-to-date medical guidelines.",
"title":"Scripts for paper on Towards up-to-date medical guidelines",
"creators":[
{
"name":"Harmsen, Wouter",
"affiliation":"Knowledge Institute for Federation of Medical Specialists, Utrecht, The Netherlands",
"orcid":"0000-0003-1423-9445"
},
{
"name":"de Groot, Janke",
"affiliation":"Knowledge Institute for Federation of Medical Specialists, Utrecht, The Netherlands",
"orcid":"0000-0002-8545-2246"
},
{
"name":"Harkema, Albert",
"affiliation":"Department of Methodology and Statistics, Faculty of Social and Behavioral Sciences, Utrecht University, Utrecht, the Netherlands",
"orcid":"0000-0002-7091-1147"
},
{
"name":"van Dusseldorp, Ingeborg",
"affiliation":"Knowledge Institute for Federation of Medical Specialists, Utrecht, The Netherlands",
"orcid":"0000-0002-6551-1413"
},
{
"name": "De Bruin, Jonathan",
"affiliation": "Department of Research and Data Management Services, Information Technology Services, Utrecht University, Utrecht, the Netherlands",
"orcid": "0000-0002-4297-0502"
},
{
"name":"Van den Brand, Sofie",
"affiliation":"Department of Methodology and Statistics, Faculty of Social and Behavioral Sciences, Utrecht University, Utrecht, the Netherlands",
"orcid":"0000-0002-2408-3336"
},
{
"name":"Van de Schoot, Rens",
"affiliation":"Department of Methodology and Statistics, Faculty of Social and Behavioral Sciences, Utrecht University, Utrecht, the Netherlands",
"orcid":"0000-0001-7736-2091"
}

],
"keywords":[
"guideline development",
"medical guidelines",
"medical",
"artificial intelligence",
"active learning",
"machine learning",
"systematic reviewing",
"systematic review",
"text data",
"natural language processing",
],
"related_identifiers":[
{
"relation":"isSupplementTo",
"identifier":"https://www.nature.com/articles/s42256-020-00287-7"
}
],
"license":"MIT",
"upload_type":"software"
}
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2021 ASReview

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
107 changes: 107 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
# Scripts for paper on "Towards up-to-date medical guidelines"

The purpose of this study was to evaluate the performance and feasibility of active learning to support the selection of relevant publications within the context of medical guideline development. This repository contains scripts to run and analyze simulations for 14 datasets openly published on the Dutch database for [medical guidelines](https://www.richtlijnendatabase.nl). The results are published in the paper "Artificial intelligence supports literature screening in medical guideline development: towards up-to-date medical guidelines".


## Installation

The scripts in this repository require Python 3.6+. Install the extra dependencies with (in the command line):

```
pip install -r requirements.txt
```

## Datasets

The raw data can be obtained via the Open Science Framework [OSF](https://osf.io/vt3n4/) and contains 14 published guidelines from the [Dutch Medical Guideline Database](https://richtlijnendatabase.nl/). The following files should be obtained from OSF and put in a folder `raw_data`:

```
Distal_radius_fractures_approach.csv
Distal_radius_fractures_closed_reduction.csv
Hallux_valgus_prognostic.csv
Head_and_neck_cancer_bone.csv
Head_and_neck_cancer_imaging.csv
Obstetric_emergency_training.csv
Post_intensive_care_treatment.csv
Pregnancy_medication.csv
Shoulder_replacement_diagnostic.csv
Shoulder_replacement_surgery.csv
Shoulderdystocia_positioning.csv
Shoulderdystocia_recurrence.csv
Total_knee_replacement.csv
Vascular_access.csv
```

Each dataset contains

```
title
abstract
```

and three columns with labeling decisions titled:

```
noisy_inclusion
expert_inclusion
fulltext_inclusion
```

The datasets in *raw_data* are split into three columns with labeling decisions. The resulting 42 datasets are generated by executing `job_splitfiles.sh`. The results are stored in the subfolder *data*.


## Descriptive dataset statistics

To create descriptive statistics for each dataset run:

```
sh generate_dataset_characteristics.sh
```

The results are stored in `output/simulation/[NAME_DATASET]/descriptives/*.json`, are merged into one table (*csv* and *excel*) by running `python scripts/merge_descriptives.py`, and stored in `output/table/data_descriptives.*`.


## Create wordclouds

To create wordclouds for each dataset run:

```
sh wordcloud_jobs.sh
```

The results are stored in `output/simulation/[NAME_DATASET]/descriptives/wordcloud`.
There are three version of the wordcloud available, a wordcloud based on the title/abstract words for:

- the entire set of records;
- for the relevant records only;
- for the irrelevant records only.


## Simulation

The simulation was conducted for each dataset with an equal amount of runs as the number of relevant records in the dataset with each relevant record being a prior inclusion and 10 randomly chosen irrelevant records. In each run, and for every dataset, the same 10 irrelevant records have been used. To extract information about the records that have been used, run `python scripts/get_prior_knowledge.py`, and the result is stored in `output/tables`.


To obtain the result of the simulation, run:

```
sh run_simulation.sh
```

The results are stored in `output/simulation`. The dataset characteristics are obtained with `python scripts/merge_descriptives.py` and stored in `output/tables`. The metrics resulting from the simulation study per run, can be obtained with `python scripts/merge_metrics.py` and stored in `output/tables`.

The raw `h5` files are 28.4Gb and are available on request, see the contact details. However, it is straightforward to obtain the results by running the simulation again by using ASReview v0.16. Seed values are set in `run_simulation.sh`.

## Analyses

The Jupyter notebook [analyses/analyses_guidelines_KIFMS.ipynb](analyses/analyses_guidelines_KIFMS.ipynb)
contains a detailed, step-by-step analysis of the simulations performed in this project. For more information about the analysis, read the [README](analyses).

## Licence

The content in this repository is published under the MIT license.

## Contact

For any questions or remarks, please send an email to [email protected].

Loading

0 comments on commit 174aed0

Please sign in to comment.