Dataset.createTransformers fix for DatasetView/TransformTrainer #364
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Dataset.createTransformers
incorrectly iteratedDataset.data
rather than using the dataset iterator. As the dataset iterator is overridden inDatasetView
but the data array is empty this caused the transformation to be fit incorrectly and the feature values to be corrupted.The fix causes
Dataset.createTransformers
to useDataset.size()
andDataset.iterator()
both of which can be overridden. The PR also includes two additional fixes forDatasetView
behaviour as shuffling was incorrect because it could return data points that weren't in the view, and the provenance recorded the wrong indices (it was tracking the shuffle indices not the indices selected for the view). I think this covers all direct uses ofDataset.data
so they now are routed through the proper methods.Motivation
This interaction causes poor performance when using
TransformTrainer
andCrossValidation
, leading to random performance on the MNIST test I did, after we found it in a different internal usecase.