Understanding clinical notes to extract diagnoses information is a long-standing and challenging task lying at the confluence of Healthcare and Natural Language Understanding. In this work, we perform experiments to learn hierarchical representations from discharge summaries to classify the final diagnoses of patients in a multi-class and multi-label setting. We also investigate the role played by different sections of these clinical note in influencing the performance of the system. Further, we use soft-attention mechanism in our model to allow better interpretebility and faster convergence. The report can be read here.
The dataset can be downloaded from Physionet website by requesting access to it.
cd src
sh data_gen_scripts/new_run.sh
The above command will take raw data and generate different datasets as mentioned in Table 4 of the paper. It'll also print the data stats as mentioned in the paper.
Before running this command, please open this file and change the data path to where the MIMIC-III files DIAGNOSIS.csv and NOTEEVENTS.csv is located. Also use the --generatesplits 1
when using the preprocessing script for the first time.
To train model with content 4:
python master_train_script.py --train_path <data_location>/50codesL5_UNK_content_4_top100_train_data.pkl --val_path <data_location>/50codesL5_UNK_content_4_top100_valid_data.pkl --model_dir <location to save model in> --attention 1 --num_workers 12 --embed_path <path to saved embeddings>/stsp_model.tsv --num_epochs 15 --exp_name attention1_50_content4_top100_stsp --use_starspace 1 --multilabel 1 --batch_size 8 --lr 1e-3
Commands to run the Attention model with other variants of the dataset can be found in src/princerun.sh
. The corresponding command for Word-Sentence encoder model are in src/princerun_wordsent.sh
.
If running the script for the first time, add the --build_starspace 1
flag and --starspace_exec <path to Starspace/run.sh>
file.
tensorboard --logdir=log
python eval_test.py --train_path <data_location>/50codesL5_UNK_content_4_top100_train_data.pkl --val_path <data_location>/50codesL5_UNK_content_4_top100_test_data.pkl --model_path <path to saved model> --attention 1 --batch_size 8