-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add PublicBenchmarkDataset
& SecretDataset
#747
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments. Also seems like some of the unit tests are failing.
isinstance(value, (int, float)) | ||
for value in values_to_merge | ||
): | ||
# alculate the mean for all attributes except charge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
calculate*
assert pulses[0] == input_features.shape[0] | ||
# Merging window (2 ns) is large enough to merge two of the pulses | ||
assert pulses[1] == (input_features.shape[0] - 1) | ||
# Merging window (2 ns) is large enough to merge four of the pulses |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be 8 ns? Also maybe the test should check that grouping more than 2 of the pulses in one is handled correctly.
graph_definition: GraphDefinition, | ||
download_dir: str, | ||
backend: str = "parquet", | ||
mode: str = "train", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not quite understand why the naming of the different modes are "train/test/test-no-noise".
os.path.join( | ||
self.dataset_dir, | ||
"selections", | ||
"test_noise_selection.parquet", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling this one "test_noise" instead of calling the no noise mode "test-no-noise" after having specified "no-noise" throughout the code is a bit confusing
@Aske-Rosted thanks for taking a look. Looks like I by mistake managed to merge another branch into this one, causing the checks to fail. I think your comments on the toggles between "test", "train" and "no-noise" is fair - and is granted quite specific to what I intend to use it for. I'll close this PR and make a new one in the future. |
This PR adds extensions of
ERDAHostedDataset
that allows us to build and share public benchmarking datasets, and secret ones! It also introduces functionality toParquetDataset
that removes chunk ids fromselection
that doesn't exist.Below is an example of the syntax of
SecretDataset
- a way for us to share datasets using ERDA sharelinks:The idea here is that we can distribute datasets "secretly" to colleagues, and once the data is ready to be made public, the data can be made available through the
PublicBenchmarkDataset
by subclassing, providing a similar syntax: