Population segmentation for health system management

Mimic-III clinical database consists of health data of forty thousand patients in Beth Israel Deaconess Medical Center in Boston from 2001 to 2012. The dataset includes information such as diagnosis, demographics, prescriptions, ICU stays, procedures, vital signs and mortality.

International Statistical Classification of Diseases and Related Health Problems, ICD-9 code, is a list of codes used to classify disease, symptoms and external causes of disease. ICD-9 code was used to categorize diseases.

Diabetes Matrix Description

The matrix contains diabetes patients administered for ICU. The matrix has 10,310 rows × 931 columns. Each row represents a patient.

Patients.csv, D_ICD_DIAGNOSES.csv, DIAGNOSES_ICD.csv, ADMISSIONS.csv and ICUSTAYS.csv are joined to create the matrix with diagnoses matched to patients, ICD9 code, disease categories and demographic information.

Numerical columns:

LOS is the length of the ICU stay of a patient. The length of stay is measured in days.
Hospitalization is the number of hospitalizations of each patient.

Categorical/ Binary columns:

Admission type columns: elective, emergency, newborn, urgent
Marital status columns: divorced, life partner, married, NaN, separated, single, unknown (default), widowed
Insurance columns: government, Medicaid, Medicare, private, self-pay
Ethnicity columns: there are 40 ethnicity columns. ASIAN, WHITE, BLACK/AFRICAN AMERICAN, HISPANIC OR LATINO, etc.
Gender: 1 for males. 0 for females
Dead: 1 for death. 0 otherwise
ICD9 code columns: Column names are 3-4 digit alphanumerics. 1 if a patient has a disease with the ICD9 code, 0 otherwise

Ordinal column:

ORDINAL_AGE column: the following 10 age groups are assigned to 1-10 in ascending order. AGE: 0-10, AGE: 11-20, AGE: 21-30, AGE: 31-40, AGE: 41-50, AGE: 51-60, AGE: 61-70, AGE: 71-80, AGE: 81-89, AGE: 90+

Diabetes_Numerical.csv and Diabetes_Category.csv are the same matrices. Diabetes_Numerical.csv column data types are all numbers. Diabetes_Category.csv column data types are numbers for numerical columns and categories for categorical columns.

Low Rank Model

K-means clustering is used to find clusters of similar patients. Using low rank model, the diabetes patient matrix was fitted to k-means algorithm with different k values. Low rank model is used in GLRMfinal.ipynb

Linear Regression

LOS prediction to evaluate the segmentation Linear regression is used in diabetes_regression.ipynb

Decision Tree

LOS prediction to evaluate the segmentation Decision Tree is used in diabetes_regression.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
code		code
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Population segmentation for health system management

Diabetes Matrix Description

Low Rank Model

Linear Regression

Decision Tree

About

Releases

Packages

Contributors 3

Languages

udellgroup/MIMICIII-Health-Care-Research

Folders and files

Latest commit

History

Repository files navigation

Population segmentation for health system management

Diabetes Matrix Description

Low Rank Model

Linear Regression

Decision Tree

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages