A new disease has recently been discovered by Dr. Smith, in England. You have been brought in to investigate. The disease has already affected more than 5000 people, with no apparent connection between them.
The most common symptoms include fever and tiredness, but some infected people are asymptomatic. Regardless, this virus is being associated with post-disease conditions such as loss of speech, confusion, chest pain and shortness of breath.
The conditions of the transmission of the disease are still unknown and there are no certainties of what leads a patient to suffer or not from it. Nonetheless, some groups of people seem more prone to be infected by the parasite than others.
In this challenge, your goal is to build a predictive model that answers the question, “Who are the people more likely to suffer from the Smith Parasite?”. With that goal, you can access a small quantity of sociodemographic, health, and behavioral information obtained from the patients.
As data scientists, your team is asked to analyze and transform the data available as needed and apply different models to answer the defined question in a more accurate way. Can you build a model that can predict if a patient will suffer, or not, from the Smith Disease?
Project for Machine Learning Subject
Master Degree in Data Science and Advanced Analytics
Universidade Nova de Lisboa
The folder data contains 6 excel files, three are for training and the other three three for testing without the outcome, those are evaluated through a kaggle competition.
The yaml file contains a conda environment used for the development of the project. It may contain unnecessary packages, nevertheless, the use of the environment is recommended for the rendering of plots. If you decide not to use the conda environment, make sure to have installed, holoviews, hvplot and panel.