-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Categorical Variables Not Working #260
Comments
Both ordered and unordered Categorical variables are supported when using Tables compatible inputs. I noticed however in your example |
I thought so too, but I kept trying to reproduce it using a plain matrix of floats and it worked just fine. The bug only comes up when I try and pass a DataFrame with categorical variables. But it does seem to be an MLJ issue, since trying to use the internal API is just fine. |
@ParadaCarleton Could we please have a minimum working example, and enough detail to reproduce? For starters, can we get rid of the |
Here's an example with dummy data; sorry about the delay in getting an MWE, I've been trying to put one together since yesterday but I couldn't reproduce it. Eventually I figured out that it only seems to happen when I use categorical arrays with more than 2 categories, so my examples using Here's an example: julia> x = hcat(DataFrame(randn(100, 10), :auto), DataFrame(CategoricalArray.(eachcol(rand(["1", "2", "3", "4"], 100, 5))), :auto); makeunique=true)
julia> config = EvoTreeRegressor()
julia> tuned_machine = machine(config, x[:, Not(1)], x[:, 1]) |> fit!
|
Ok I see, currently EvoTrees MLJ wrapper still only support the original Matrix based API (https://github.com/Evovest/EvoTrees.jl/#matrix-features-input), for which Categorical support don't exist. Test with Bool only worked since a conversion to a numeric matrix was possible. I'm not yet clear as to how to update the MLJ wrapper to support the Tables based API. On one hand, it should be natural since MLJ naturally works with Tables. However, there are some different input logic between EvoTrees and MLJ API, notably that EvoTrees expects data to be passed as a single Table with additional kwarg to specify the target variable (and optionally the variables to be considered as features), whereas MLJ, AFAIK, assumes a Features Table along a target vector. |
@ParadaCarleton Thanks for teasing out a MWE. I realize this can be some work 🙏. According to the MLJ doc string for this model, the data provided is unacceptable: You can have inputs with This is fixed as follows: using MLJ # or MLJBase or ScientificTypes
x = coerce(x, Multiclass => OrderedFactor) (Or, one can use That said, the
@jeremiedb This is all correct, but presumably already addressed in the existing interface. I think what is needed is some extra logic and preprocessing in the |
@ablaom I think the referenced PR should resolve the mentionned issues (the provided MWE now successfully trains). |
It looks like ordered categorical variables error:
Unordered categorical variables seem to error as well, but I'm not sure if that's because they're not supported or because of a bug.
The text was updated successfully, but these errors were encountered: