Skip to content

Learning from Discharge Summaries to extract mentioned diagnoses using Hierarchical Attention Model

Notifications You must be signed in to change notification settings

anantzoid/Medical-Diagnosis-Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Medical-Diagnosis-Learning

Abstract

Understanding clinical notes to extract diagnoses information is a long-standing and challenging task lying at the confluence of Healthcare and Natural Language Understanding. In this work, we perform experiments to learn hierarchical representations from discharge summaries to classify the final diagnoses of patients in a multi-class and multi-label setting. We also investigate the role played by different sections of these clinical note in influencing the performance of the system. Further, we use soft-attention mechanism in our model to allow better interpretebility and faster convergence. The report can be read here.

Running the code

MIMIC-III dataset

The dataset can be downloaded from Physionet website by requesting access to it.

Preprocessed Data Generation

cd src
sh data_gen_scripts/new_run.sh

The above command will take raw data and generate different datasets as mentioned in Table 4 of the paper. It'll also print the data stats as mentioned in the paper.

Before running this command, please open this file and change the data path to where the MIMIC-III files DIAGNOSIS.csv and NOTEEVENTS.csv is located. Also use the --generatesplits 1 when using the preprocessing script for the first time.

Training the model

To train model with content 4:

python master_train_script.py --train_path <data_location>/50codesL5_UNK_content_4_top100_train_data.pkl --val_path <data_location>/50codesL5_UNK_content_4_top100_valid_data.pkl --model_dir <location to save model in> --attention 1 --num_workers 12 --embed_path <path to saved embeddings>/stsp_model.tsv --num_epochs 15 --exp_name attention1_50_content4_top100_stsp --use_starspace 1 --multilabel 1 --batch_size 8 --lr 1e-3

Commands to run the Attention model with other variants of the dataset can be found in src/princerun.sh. The corresponding command for Word-Sentence encoder model are in src/princerun_wordsent.sh.

If running the script for the first time, add the --build_starspace 1 flag and --starspace_exec <path to Starspace/run.sh> file.

Viewing the learning curves

tensorboard --logdir=log

Evaluating on trainset

python eval_test.py --train_path <data_location>/50codesL5_UNK_content_4_top100_train_data.pkl --val_path <data_location>/50codesL5_UNK_content_4_top100_test_data.pkl --model_path <path to saved model> --attention 1 --batch_size 8 

About

Learning from Discharge Summaries to extract mentioned diagnoses using Hierarchical Attention Model

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published