-
Notifications
You must be signed in to change notification settings - Fork 62
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[GSProcessing] Add pre-computed categorical transformation loading (#870
) *Issue #, if available:* *Description of changes:* * Follow-up to #857 * Allow us to re-apply a previously saved categorical transformation to new data. See below for design details. To be able to re-apply the categorical transformations that we create using the code in #857 , we first create a mapping from original string to one-hot representation, that we read from the saved JSON file, then use a UDF to use the mapping(s) on the column(s). The `DistributedTransformation` class from which all transformation implementations inherit, gains a new function, `apply_precomputed_transformation`. When a pre-computed transformation JSON file exists in the input, and the feature is one of those listed in that file, we use this function to re-apply the existing transformation instead of creating a new one. The default implementation for `apply_precomputed_transformation` is to log a warning and apply a new transformation. When we implement a pre-computed transform for a new transformation (e.g. numerical) we need to: * Ensure the the transformation's `self.json_representation` is populated during the call to `apply()`. This ensures the transformation info will be saved in the output JSON. * Override the `apply_precomputed_transformation` function (as we did for `DistCategoryTransformation` here), so that it uses the dict loaded from the JSON file to re-apply the transformation to the new data. By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
- Loading branch information
Showing
12 changed files
with
414 additions
and
98 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.