- Predicting Type 2 Diabetes Mellitus using Clinical Symptoms: A Machine Learning Approach
Diabetes Mellitus (DM) is a chronic metabolic disorder characterized by high blood sugar levels. Traditional diagnosis of DM involves laboratory investigations such as Fasting Blood Sugar (FBS), Postprandial Blood Sugar (PPBS), and Glycated Hemoglobin (HbA1c). However, this project aims to explore the use of supervised machine learning approaches such as KNN, Logistic regression, SVM and Decision trees to predict Type 2 DM based solely on clinical symptoms, without relying on traditional laboratory tests. This algorithm is designed to serve as a low cost screening tool for identifying individuals with diabetes in hyperglycemic states, particularly in remote locations with limited access of clinical testing.
- A Simplified and Accessible Approach for Predicting Body Fat Using Multiple Body Circumferences through Machine Learning
Accurately estimating body fat is crucial for assessing an individual's health and overall fitness. While the gold standard methods for estimating body fat, such as DEXA scans and hydrostatic weighing, provide accurate results, they are expensive and impractical for cases with limited clinical necessity. Alternative methods, such as bio-impedance analysis and the use of calipers, are more accessible but may yield less accurate results. This project aims to develop a machine learning model using multiple linear regression, ridge regression, lasso regression, and regression trees to predict body fat using multiple body circumferences, without relying on density calculated from underwater weighing.
- Predicting Recurrence of Breast Cancer from Histopathological data: Dimensionality Reduction and Multi-Layer Perceptron Model Evaluation
Attempted to understand the relevance of different histopathological features on breast cancer recurrence using multiple machine learning techniques like logistic regression, and principal component analysis (PCA) on a breast cancer dataset with 24 features. PCA helped identify 9 principal components that explain 90% of the variance in the original dataset. I then looked for any overlap between the features with a higher absolute coefficient magnitude in logistic regression model and the features in the 9 principal components identified by PCA. While some amount of overlap was present in the results, the intersection wasn’t as striking as expected. To assess the ability of the PCA to improve the predictive ability of machine learning models, I initially built a multilayer perceptron (MLP) model and trained it using the original dataset with 24 features, resulting in an F1 score of 0.69. Then, I trained the same MLP model using the 9 principal components obtained from PCA, and the F1 score improved marginally to 0.71. The marginal improvement in the F1 score indicates that the 9 principal components retained sufficient information for the MLP model to make accurate predictions. This suggests that the original 24 features might contain some redundant or less informative information.
- Using CNN and Transfer Learning to Differentiate Histopathological Types of NSCLC (Non-Small Cell Lung Carcinomas) Using CT scans
Attempted to differentiate the different histopathological types of lung carcinomas using just CT scans (i.e without taking invasive histopathological samples) via a basic 12 layered CNN and later applied a transfer learning approach using a pre trained model - ResNet50. The basis for this attempt is that there are certain radiological features (like the position of the tumor or its appearance) that are associated with each of the histological subtypes. In a clinical setting, these findings are often not used to identify the different types of lung cancer as the radiological evidence may be nonspecific, hence clinicians usually opt for histopathological studies. While the model I developed was able to distinguish between normal CT scans and CT scans with tumors, its performance was limited in being able to differentiate the specific histopathological subtypes of lung cancer. But the model in its current state can still help identify potential cases for further investigation and follow-up. Incorporating 3D modeling by aggregating multiple slices of the CT chest scans can potentially improve the accuracy of the model. By considering the spatial attributes and tumor dimensions in a three-dimensional context, the model may be better equipped to capture important features associated with different histopathological subtypes. Additionally, increasing the size of the training dataset or utilizing a more advanced model architecture can also improve the accuracy metrics.