-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvements to OrdinalEncoder, OneHotEncoder, NaiveBayes, LogisticRegression #293
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Perhaps we just need to remove the commented out code. :)
Yeah, agreed. :) I would honestly replace the following line https://github.com/elixir-nx/scholar/blob/main/lib/scholar/preprocessing.ex#L179 tensor
|> Nx.new_axis(1)
|> Nx.broadcast({num_samples, num_classes})
|> Nx.equal(Nx.iota({num_samples, num_categories}, axis: 1)) i.e. remove the ordinal encoding that is performed as part of |
I think the preprocessing function should be the same as the module that we invoke (we may remove the functions altogether). But I agree we should probably remove the ordinal encoding from one hot encoding preprocessor module. |
Very well, we can have a separate pull request for that. |
Definitely, so it all looks good to me! |
Changelog:
OrdinalEncoder
struct; it now contains only:categories
field.OrdinalEncoder.fit/2
and now it works with tensors of size 1.opts
as 2nd argument fromOrdinalEncoder.fit_transform
as it is not needed.OneHotEncoder
struct; it now contains only:ordinal_encoder
field.:num_classes
option to:num_categories
as these encoders can be used to encode features, not only target variables.NaiveBayes.Complement
andLogisticRegression
.Fixes #290.