This is the code repository for ML COMP3009 module
This is the first coursework of Machine Learning COMP3009 on the topic of Artificial Neural Networks (ANN). We applied standard ANN methodologies on Beijing PM2.5 dataset and Iris dataset to solve regression and classification problems respectively. To investigate how different training settings affect the model performance, we adopted “control variables” methodology in our experiments. Then, based on the previous experiments, we chose a set of appropriate hyperparameter combinations and used cross-validation to obtain the most optimal solution. For the evaluation part, we utilised 10-fold cross validation to evaluate our model to calculate average Root Mean Square Error (RMSE) for the regression problem and average accuracy for the classification problem.
In the report, dataset information is briefly described in the first section, followed by an illustration about data pre-processing. Besides, the details of our experiments, including experiment configurations, training process and evaluation, are demonstrated as well. Next, several solutions to avoid overfitting are further discussed below. After that, a summary of our works is presented.
This is the second coursework of Machine Learning COMP3009 on the topic of Decision Tree (DT). We applied the standard ID3 algorithm on Iris dataset and Beijing PM2.5 dataset to solve classification and regression problems respectively. For the evaluation part, we utilised 10-fold cross validation to evaluate our tree model to calculate average Root Mean Square Error (RMSE) for the regression problem and F1-Score for the classification problem.
In the report, the original dataset information is briefly described in the first section, followed by an illustration about data pre- processing. Besides, the details of our experiments, including experiment configurations and evaluation, are demonstrated. Lastly, answers to the questions on the coursework sheet are presented.
This is the third coursework of Machine Learning COMP3009 on the topic of Support Vector Machine (SVM). We trained, evaluated, optimised SVM models on Iris dataset and Beijing PM2.5 dataset. Besides, the performances of Artificial Neural Network (ANN), Decision Tree (DT) and SVMs with 3 kernel functions were compared based on a statistical approach.
In the report, the details of the comparison will be demonstrated, including the experiment configurations and result analysis. All the experiment results are shown in the appendix.