Reproduction and fixes to the issue #6 #7
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hello, thank you and kudos for the awesome library and paper.
The code below can reproduce issue #6 , which seems to stem from two causes:
In my work (and Joseph's pufferlib), we run a seeding function at each training, which also seems to affect CARBS' random sampling. As a result, I got many (like 9) repeated samples out of 10 random samples. The same suggestion is then fed into the observations with different costs, and as Joseph said, "there can be no groups for which the mean is less than the quantile."
My fix addresses both issues:
_get_mask_for_invalid_points_in_basic()
, which is NOT used for resampling.observations_below_min_threshold
when the group-based result is empty.Please let me know if you need anything else.