You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If there are categories present in holdout data that weren't present in the training data, the OrdinalEncoder will not work unless handle_unknown and unknown_value are set correctly. This is problematic for the initial integration of the OrdinalEncoder into AutoMLSearch, as the default value for handle_unknown is error.
This can also be problematic for the Ordinal logical type, which will set the order according to the categories that are present, so if we were to try and set the instantiated Ordinal Logical Type on holdout data with different categories, it may produce a Woodwork error that the data contains values that are not present in the order values provided. We should investigate when we may trigger this Woodwork error, and I've opened up an issue in Woodwork to consider ways to handle this kind of thing (alteryx/woodwork#1598).
We should look into how we can handle this. We have several options:
Handle this as part of automl search in the OrdinalEncoder instantiation by setting the parameters such that we handle unknowns gracefully - I think this may make the most sense, and could allow users to have further control of how they would want to handle those unknown values.
Wait to set the Encoder's categories until transform/allow updating the values at transform. I think waiting to set the categories at all until transform is probably putting too much logic into transform, and could also create the reverse problem of not having categories from the training data. More likely, we will want to consider allowing users to expand the categories if needed.
Change the default value for handle_unknown to no longer error - maybe to set the values to be nans?
The text was updated successfully, but these errors were encountered:
If there are categories present in holdout data that weren't present in the training data, the OrdinalEncoder will not work unless
handle_unknown
andunknown_value
are set correctly. This is problematic for the initial integration of the OrdinalEncoder into AutoMLSearch, as the default value forhandle_unknown
is error.This can also be problematic for the Ordinal logical type, which will set the
order
according to the categories that are present, so if we were to try and set the instantiated Ordinal Logical Type on holdout data with different categories, it may produce a Woodwork error that the datacontains values that are not present in the order values provided
. We should investigate when we may trigger this Woodwork error, and I've opened up an issue in Woodwork to consider ways to handle this kind of thing (alteryx/woodwork#1598).We should look into how we can handle this. We have several options:
transform
, and could also create the reverse problem of not having categories from the training data. More likely, we will want to consider allowing users to expand the categories if needed.handle_unknown
to no longer error - maybe to set the values to be nans?The text was updated successfully, but these errors were encountered: