Imperial College MATH60026 Methods for Data Science is a coursework-only module consisting of two courseworks.
The first coursework covers implementing a decision tree and random forest, and a MLP (all using only numpy) on the nanonelectrodes dataset. The second task is to implement kNN and logistic regression on the brain cancer dataset, which is imbalanced (class 2 is an underrepresented class), where we explored methods like weighted kNN, 2-step kNN and kernelised logistic regression.
I got 87/100 for this coursework.
The second coursework deals with the star images dataset. This dataset has two data for each star: grey image as an (32,32) array, and an embedding as an (180,) array from the output of a CNN, to train against its star label. The first task involves implementing a CNN (pytorch is allowed). The second task involves exploring unsupervised methods on the embeddings, using methods like PCA, building a kNN graph using cosine distance and exploring its normalised laplacian, and using an ISOMAP-like algorithm to explore the resistance distance and centered distance matrix.