Correlated rank similarity metric #59

gmingas · 2021-04-06T09:36:12Z

This PR adds support for Jenning's and Sebastian's correlated rank similarity metric.

Changes:

Adds three new methods to the RankingSimilarity class of rbo.py. These implement the correlated rank metric, its extrapolated version and the LP solver.
Modifies feature_importance.py to calculate the metric and also adds more complete RBO calculation (all types of RBO apart from uneven extrapolation) when comparing orig vs. rlds, orig vs rand and orig vs lower.
Adds calculation of correlation matrix in feature_importance.py
Adds pulp to required libraries which will require a rebuild of the Docker image.

~~WIP:~~

~~The solver is too slow at the moment when testing it on the framingham dataset and I am trying to figure out why. It is possible that the LP problem has too many variables.<\s>~~

…es(), silence pulp messages

gmingas · 2021-04-08T17:20:29Z

@OscartGiles This PR adds pulp to the required libraries list. Can you check if I have made all the necessary changes in the code for that to work? And is it easy to update the environment we use in Azure to support this?

…of adult and household datasets for new runs

OscartGiles · 2021-04-12T08:31:05Z

@gmingas - Just fixing the tests now but looks good. To get onto the VMs I can either redeploy the VM(s) or we can just install manually for now. Should be added next time a VM is deployed.

gmingas · 2021-04-12T08:48:47Z

Thanks! Yes, I did add it manually to do the weekend runs. No need to redeploy now, we can wait until the next time they are deployed.

OscartGiles · 2021-04-12T09:28:52Z

The Test pipeline run now fails because it tries to run the household_poverty stuff but doesn't have the data.
Can we either grab the data in the makefile or tell it not to run the household poverty stuff when it runs the pipeline?

gmingas · 2021-04-12T09:33:52Z

We can grab the data in the makefile using the Kaggle API but that would require adding an authentication token to the repo. I don't know if it is possible to do this is a secure way, probably not from a quick search but maybe someone has done this before?

The other option is to remove the household cleaning code from the makefile and run it manually whenever we need it after the data are added manually too.

OscartGiles · 2021-04-12T09:49:34Z

We could set it as an environment variable (can save it as a secret on github for use in the CI pipeline). But then we also need to make sure it is an environment variable on all our VMs and make it clear in the README that you need a kaggle API token.
Save you manually downloading it on the VMs though.

For ref https://github.com/Kaggle/kaggle-api#api-credentials

…ifications Feature/dataset modifications

gmingas added 3 commits April 8, 2021 14:38

Add correlation rank similarity metric

89a01c4

Clean up code, move correlation computation outside of compare_featur…

71e4c6e

…es(), silence pulp messages

Remove extrapolated v1 metric calculations

b04dfae

gmingas force-pushed the feature/correlation-similarity branch from aae547f to b04dfae Compare April 8, 2021 16:15

gmingas changed the title ~~WIP: Correlated rank similarity metric~~ Correlated rank similarity metric Apr 8, 2021

Add pulp to environment

69e301d

gmingas and others added 3 commits April 8, 2021 23:07

Make modifications to ensemble files including creating new versions …

2777bb8

…of adult and household datasets for new runs

Fix bug in json creation for small household dataset

01e8029

Merge branch 'develop-paper' into feature/correlation-similarity

cafe35a

Merge pull request #60 from alan-turing-institute/feature/dataset-mod…

c1b9b1e

…ifications Feature/dataset modifications

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correlated rank similarity metric #59

Correlated rank similarity metric #59

gmingas commented Apr 6, 2021 •

edited

Loading

gmingas commented Apr 8, 2021

OscartGiles commented Apr 12, 2021

gmingas commented Apr 12, 2021

OscartGiles commented Apr 12, 2021

gmingas commented Apr 12, 2021

OscartGiles commented Apr 12, 2021 •

edited

Loading

Correlated rank similarity metric #59

Are you sure you want to change the base?

Correlated rank similarity metric #59

Conversation

gmingas commented Apr 6, 2021 • edited Loading

gmingas commented Apr 8, 2021

OscartGiles commented Apr 12, 2021

gmingas commented Apr 12, 2021

OscartGiles commented Apr 12, 2021

gmingas commented Apr 12, 2021

OscartGiles commented Apr 12, 2021 • edited Loading

gmingas commented Apr 6, 2021 •

edited

Loading

OscartGiles commented Apr 12, 2021 •

edited

Loading