CanceRater

CanceRater aims to predict cancer probability of a patient. Akka-Http and Spark ML (MLLib) are the main components.

Logistic Regression algorithm is used to predict the cancer probability (in percentage) of a given patient from the pre-learned model (from a given corpus).

Logistic regression is used because of its simplicity and performance for such a task. Support Vector Machines are the other option for such task but SVMs are adviced to be used when the feature set is too large compared to sample size.

Decision Trees (e.g. Random Forest) can be used if the probability percentage is not the aim and only the prediction of the system is enough. Therefore, we eliminate that option because of our problem definition.

The pipeline approach provided by the Spark ML library is used. Categorical features are Gender and Job fields. Gender conversion is done at the enumeration level, Job conversion is done with the StringIndexer. After all the features are converted to numeric values, StandardScaler is applied to scale features to be able to improve the performance of the regression. 10-fold Cross Validation (0.8 training, 0.2 test split) is applied to find best model, since test corpus is too small.

The project is an sbt project, so you can run the system basically with sbt commands on the root directory after clone operation:

  > sbt
  > clean
  > compile
  > test
  > run

The rest layer of the system provides 2 endpoint:

GET /cancerater/cm => returns the confusion matrix of the current model on test data.

POST /cancerater/check => takes a patient info json and return its cancer probability.

input:

{
    "gender": "Male",
    "age": 18,
    "weight": 70,
    "height": 180,
    "job": "Student"
}

output:
```
{
    "score": 0.12
}
```

Here is some reference pages to read:

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
project		project
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.sbt		build.sbt
cancer_data.txt		cancer_data.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CanceRater

About

Releases

Packages

Languages

License

ferhtaydn/canceRater

Folders and files

Latest commit

History

Repository files navigation

CanceRater

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages