Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sarcos data (44976) contains duplicates #61

Open
sebffischer opened this issue Dec 5, 2023 · 3 comments
Open

Sarcos data (44976) contains duplicates #61

sebffischer opened this issue Dec 5, 2023 · 3 comments

Comments

@sebffischer
Copy link

There are (my mistake) unfortunately two versions of the sarcos data, i.e. https://www.openml.org/search?type=data&status=active&id=44976 and https://www.openml.org/search?type=data&status=active&id=43873.

The first contains the duplicates from the test set, while the latter does not. Also the first was accidentally used by the CTR-23

@PGijsbers
Copy link

PGijsbers commented Dec 5, 2023

What do you suggest to do? You own 44976, so you could choose to deactivate it and instead link 43873 to CTR-23.

@sebffischer
Copy link
Author

No, I think the suite should stay as it is. Matthias suggested that I add an issue here. When we do a new version of the CTR23 this should just be corrected I guess. Do you think I should then close this issue?

@PGijsbers
Copy link

You could also choose to deactivate the dataset but keep it in the suite. Direct downloading is unaffected by it being deactivated, but it should function as a clear signals to others not to use it, and it won't show up when listing datasets with an active filter. If you do experience issues, deactivating a dataset can be reversed (by administrators-but you know how to reach us :)).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants