Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset.createTransformers fix for DatasetView/TransformTrainer #364

Merged
merged 5 commits into from
Apr 30, 2024

Conversation

Craigacp
Copy link
Member

@Craigacp Craigacp commented Apr 3, 2024

Description

Dataset.createTransformers incorrectly iterated Dataset.data rather than using the dataset iterator. As the dataset iterator is overridden in DatasetView but the data array is empty this caused the transformation to be fit incorrectly and the feature values to be corrupted.

The fix causes Dataset.createTransformers to use Dataset.size() and Dataset.iterator() both of which can be overridden. The PR also includes two additional fixes for DatasetView behaviour as shuffling was incorrect because it could return data points that weren't in the view, and the provenance recorded the wrong indices (it was tracking the shuffle indices not the indices selected for the view). I think this covers all direct uses of Dataset.data so they now are routed through the proper methods.

Motivation

This interaction causes poor performance when using TransformTrainer and CrossValidation, leading to random performance on the MNIST test I did, after we found it in a different internal usecase.

@Craigacp Craigacp added the Oracle employee This PR is from an Oracle employee label Apr 3, 2024
@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Apr 3, 2024
Copy link
Member

@JackSullivan JackSullivan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@Craigacp Craigacp merged commit 83e197f into oracle:main Apr 30, 2024
14 checks passed
@Craigacp Craigacp deleted the transformation-fix branch April 30, 2024 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCA Verified All contributors have signed the Oracle Contributor Agreement. Oracle employee This PR is from an Oracle employee
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants