Medical-Record-Linkage-Ensemble Paper Reproduction

Course: Deep Learning for Healthcare, Gargi Deb

This repository contains two notebooks one for each dataset (FEBRL and ePBRN) and utilizes the code provided by the authors of the original paper, Statistical supervised meta-ensemble algorithm for medical record linkage, to reproduce its results and claims and also builds on top of for additional ablations and experiments.

Authors of the original paper:

Kha Vo [email protected],

Jitendra Jonnagaddala [email protected],

Siaw-Teng Liaw [email protected].

Resources used to reproduce results:

Original Code provided by authors:

Kha Vo and Jitendra Jonnagaddala and Siaw-Teng Liaw. (2019). Medical-Record-Linkage-Ensemble. Retrieved from https://github.com/ePBRN/Medical-Record-Linkage-Ensemble. Paper: "Statistical supervised meta-ensemble algorithm for data linkage"

Original Paper:

Kha Vo, Jitendra Jonnagaddala, Siaw-Teng Liaw, Statistical supervised meta- ensemble algorithm for medical record linkage, Journal of Biomedical Informatics, Volume 95, 2019, 103220, ISSN 1532-0464, https://doi.org/10.1016/j.jbi.2019.103220.

Requirements

All code is run sucessfully in Google Colab Pro environment with Python 3.6. You will need a Google Colab Pro Account to run the notebooks on. Google Colab already comes with a lot of default ML packages installed and does not require additional installation. The only package used by the authors that is not installed in Google Colab Pro is record_linkage. There is a cell in each runbook that when it is ran in Google Colab Pro, it will install the package.

To access the datasets used by the authors, download the following files from the repository provided by the original authors and save them in your local file system where they will be retrieved at the time of uploading when running the cells in the notebook.

Download and save:

febrl4_UNSW.csv

ePBRN_D_dup.csv

ePBRN_F_dup.csv

febrl3_UNSW.csv

from the repo provided by the original authors. Repo provided by authors can be found here: https://github.com/ePBRN/Medical-Record-Linkage-Ensemble

Packages used:

numpy pandas sklearn torch recordlinkage

Training & Evaluation

For training and evaluating the models, there are dedicated cells in each notebook that have the set hyperparameters and does not require any additional setup or commands, except just running the cell in Google Colab Pro.

Results

The following baseline performance results are from the reproduction of the original paper, and not the original results as stated by the authors:

FEBRN dataset (Source A):

Model name	Precision	Recall	F-Score
SVM	98.72%	99.63%	99.18%
NN	96.96%	99.43%	99.19%
LR	97.64%	99.63%	99.62%

FEBRN dataset (Source B):

Model name	Precision	Recall	F-Score
SVM	31.78%	98.61%	48.07%
NN	69.20%	96.46%	80.59%
LR	59.06%	96.84%	73.37%

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
FEBRL(Source_A)_code.ipynb		FEBRL(Source_A)_code.ipynb
README.md		README.md
ePBRN(Source_B)_code.ipynb		ePBRN(Source_B)_code.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Medical-Record-Linkage-Ensemble Paper Reproduction

Course: Deep Learning for Healthcare, Gargi Deb

Requirements

Training & Evaluation

Results

FEBRN dataset (Source A):

FEBRN dataset (Source B):

About

Releases

Packages

Languages

gargidb/Medical-Record-Linkage-Ensemble-Paper-Reproduction

Folders and files

Latest commit

History

Repository files navigation

Medical-Record-Linkage-Ensemble Paper Reproduction

Course: Deep Learning for Healthcare, Gargi Deb

Requirements

Training & Evaluation

Results

FEBRN dataset (Source A):

FEBRN dataset (Source B):

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages