Detecting coding / noncoding DNA sequences ( after convert to CGR (Chaos_game_representation) images) using four different Machine learning algorithms
Datasets: Training data from 10 mammalian species(40 coding sequence and 40 non coding intron parts), gene names: CSN1S1, IL2, LCE6A and SMCP Testing data from 10 mammalian species(20 coding sequence and 20 non coding intron parts), gene names: STK11 and TP53
So in this project we try use four algorithms and then choose best one : Naive Bayes algorithm Logistic Regression algorithm K-Nearest Neighbor algorithm Perceptron algorithm
In conclusion, from point 1 and 2 , from Sensitivity and cross validation the Perceptron is the best, and from Accuracy the Logistic regression is the best.
So, In this project we conclude Perceptron and logistic regression ( supervised learning, linear clasifier ) are the best models for detecting coding DNA sequences from noncoding DNA sequences using chaos game represntation approach. may be because the logistic regression & perceptron is an algorithms for learning a binary classifier, and gradient descent to reach best decision boundary.
BY: Mohamed Ahmed Mohamed Emam - Amna Ali Shaheen Mohamed