Skip to content

JackxTong/Methods_for_Data_Science_Courseworks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

Methods_for_Data_Science_Courseworks

Imperial College MATH60026 Methods for Data Science is a coursework-only module consisting of two courseworks.

The first coursework covers implementing a decision tree and random forest, and a MLP (all using only numpy) on the nanonelectrodes dataset. The second task is to implement kNN and logistic regression on the brain cancer dataset, which is imbalanced (class 2 is an underrepresented class), where we explored methods like weighted kNN, 2-step kNN and kernelised logistic regression.

I got 87/100 for this coursework.

The second coursework deals with the star images dataset. This dataset has two data for each star: grey image as an (32,32) array, and an embedding as an (180,) array from the output of a CNN, to train against its star label. The first task involves implementing a CNN (pytorch is allowed). The second task involves exploring unsupervised methods on the embeddings, using methods like PCA, building a kNN graph using cosine distance and exploring its normalised laplacian, and using an ISOMAP-like algorithm to explore the resistance distance and centered distance matrix.

About

Imperial College MATH60026 coursework (23-24)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published