The purpose of this study was to evaluate the performance and feasibility of active learning to support the selection of relevant publications within the context of medical guideline development. This repository contains scripts to run and analyze simulations for 14 datasets openly published on the Dutch database for medical guidelines. The results are published in the paper "Artificial intelligence supports literature screening in medical guideline development: towards up-to-date medical guidelines".
The scripts in this repository require Python 3.6+. Install the extra dependencies with (in the command line):
pip install -r requirements.txt
The raw data can be obtained via the Open Science Framework OSF and contains 14 published guidelines from the Dutch Medical Guideline Database. The following files should be obtained from OSF and put in a folder raw_data
:
Distal_radius_fractures_approach.csv
Distal_radius_fractures_closed_reduction.csv
Hallux_valgus_prognostic.csv
Head_and_neck_cancer_bone.csv
Head_and_neck_cancer_imaging.csv
Obstetric_emergency_training.csv
Post_intensive_care_treatment.csv
Pregnancy_medication.csv
Shoulder_replacement_diagnostic.csv
Shoulder_replacement_surgery.csv
Shoulderdystocia_positioning.csv
Shoulderdystocia_recurrence.csv
Total_knee_replacement.csv
Vascular_access.csv
Each dataset contains
title
abstract
and three columns with labeling decisions titled:
noisy_inclusion
expert_inclusion
fulltext_inclusion
The datasets in raw_data are split into three columns with labeling decisions. The resulting 42 datasets are generated by executing job_splitfiles.sh
. The results are stored in the subfolder data.
To create descriptive statistics for each dataset run:
sh generate_dataset_characteristics.sh
The results are stored in output/simulation/[NAME_DATASET]/descriptives/*.json
, are merged into one table (csv and excel) by running python scripts/merge_descriptives.py
, and stored in output/table/data_descriptives.*
.
To create wordclouds for each dataset run:
sh wordcloud_jobs.sh
The results are stored in output/simulation/[NAME_DATASET]/descriptives/wordcloud
.
There are three version of the wordcloud available, a wordcloud based on the title/abstract words for:
- the entire set of records;
- for the relevant records only;
- for the irrelevant records only.
The simulation was conducted for each dataset with an equal amount of runs as the number of relevant records in the dataset with each relevant record being a prior inclusion and 10 randomly chosen irrelevant records. In each run, and for every dataset, the same 10 irrelevant records have been used. To extract information about the records that have been used, run python scripts/get_prior_knowledge.py
, and the result is stored in output/tables
.
To obtain the result of the simulation, run:
sh run_simulation.sh
The results are stored in output/simulation
. The dataset characteristics are obtained with python scripts/merge_descriptives.py
and stored in output/tables
. The metrics resulting from the simulation study per run, can be obtained with python scripts/merge_metrics.py
and stored in output/tables
.
The raw h5
files are 28.4Gb and are available on request, see the contact details. However, it is straightforward to obtain the results by running the simulation again by using ASReview v0.16. Seed values are set in run_simulation.sh
.
The Jupyter notebook analyses/analyses_guidelines_KIFMS.ipynb contains a detailed, step-by-step analysis of the simulations performed in this project. For more information about the analysis, read the README.
The content in this repository is published under the MIT license.
For any questions or remarks, please send an email to [email protected].