Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Support for tables #154

Closed
ParadaCarleton opened this issue Oct 14, 2023 · 3 comments
Closed

Feature request: Support for tables #154

ParadaCarleton opened this issue Oct 14, 2023 · 3 comments

Comments

@ParadaCarleton
Copy link

ParadaCarleton commented Oct 14, 2023

For example:

julia> using DataFrames, CategoricalArrays

julia> x = hcat(DataFrame(randn(10, 5), :auto), DataFrame(CategoricalArray.(eachcol(rand(["1", "2", "3", "4"], 10, 5))), :auto); makeunique=true)
10×10 DataFrame
 Row │ x1           x2           x3          x4          x5          x6          x7          
     │ Float64      Float64      Float64     Float64     Float64     Float64     Float64     
─────┼────────────────────────────────────────────────────────────────────────────────────────
   10.0076124    2.21962     -1.89752     0.24856    -0.185547   -1.18036     0.177196   
   2-0.425131    -1.95286     -2.08625     0.480588    1.72549    -0.28748    -0.711898
   3-0.678378     0.956257    -0.426269   -0.740123    1.94817     0.0582993   0.814919
   40.815181     0.882876    -0.527539   -0.769075    0.401716   -1.25234     0.216388
   5-0.430265     0.74117      0.0932157  -0.80661     1.83201    -1.00751     0.0808424  
   6-0.953549     0.921323    -0.192622   -0.152674   -0.829379    0.629351    0.719016
   7-1.55753      0.580445     0.428604   -0.423595    1.187      -0.730763   -1.19092
   8-1.93545      0.120406     0.898218    0.629203   -0.164727    0.121863   -0.46737
   9-3.16131     -2.60021      0.0405212  -0.635231    1.09621     0.09391     2.50053    
  10-1.43519      0.240422    -0.0817438  -0.0991257  -0.122359   -0.243555    1.09018

julia> config = LinearRegressor()
LinearRegressor(
  fit_intercept = true, 
  solver = nothing)

julia> tuned_machine = machine(config, x[:, Not(1)], x[:, 1]) |> fit!
ERROR: MethodError: no method matching fit(::GeneralizedLinearRegression{L2Loss, NoPenalty}, ::Matrix{Any}, ::Vector{Float64}; solver::Analytical)

Closest candidates are:
  fit(::GeneralizedLinearRegression, ::AbstractMatrix{<:Real}, ::AbstractVector{<:Real}; data, solver)
   @ MLJLinearModels ~/.julia/packages/MLJLinearModels/yYgtO/src/fit/default.jl:36
  fit(::GeneralizedLinearRegression; kwargs...)
   @ MLJLinearModels ~/.julia/packages/MLJLinearModels/yYgtO/src/fit/default.jl:50

Stacktrace:
  [1] fit(m::LinearRegressor, verb::Int64, X::DataFrame, y::Vector{Float64})
    @ MLJLinearModels ~/.julia/packages/MLJLinearModels/yYgtO/src/mlj/interface.jl:29
  [2] fit_only!(mach::Machine{LinearRegressor, true}; rows::Nothing, verbosity::Int64, force::Bool, composite::Nothing)
    @ MLJBase ~/.julia/packages/MLJBase/fEiP2/src/machines.jl:680
  [3] fit_only!
    @ MLJBase ~/.julia/packages/MLJBase/fEiP2/src/machines.jl:606 [inlined]
  [4] #fit!#63
    @ MLJBase ~/.julia/packages/MLJBase/fEiP2/src/machines.jl:777 [inlined]
  [5] fit!
    @ MLJBase ~/.julia/packages/MLJBase/fEiP2/src/machines.jl:774 [inlined]
  [6] |>(x::Machine{LinearRegressor, true}, f::typeof(fit!))
    @ Base ./operators.jl:917
  [7] top-level scope
    @ REPL[24]:1

Similar issue in EvoTrees.jl.

@tlienart
Copy link
Collaborator

tlienart commented Oct 14, 2023

Unrelated to MLJLinearModels. Please ask on discourse for help or possibly open an issue in MLJ directly. The data that gets passed through is not properly typed. You can see that here:

julia> tuned_machine = machine(config, x[:, Not(1)], x[:, 1]) |> fit!
ERROR: MethodError: no method matching fit(::GeneralizedLinearRegression{L2Loss, NoPenalty}, ::Matrix{Any}, ::Vector{Float64}; solver::Analytical)

It should be a Matrix{<:Real}, this suggests that you might have missed an encoding step.

@ParadaCarleton
Copy link
Author

The data that gets passed through is not properly typed. You can see that here:

Right, sorry, I was under the impression that MLJ models were expected to accept arbitrary tables as inputs, rather than just accepting Matrix{<:Real}. I'll edit this issue, then.

@ParadaCarleton ParadaCarleton changed the title Table inputs are broken Feature request: Support for tables Oct 16, 2023
@tlienart
Copy link
Collaborator

Issue name is incorrect, MLJ handles tables just fine and MLJLM handles matrices as it should too; the interface is handled by MLJ; the issue here is that you did not encode the categorical features.

julia> using DataFrames, CategoricalArrays, ScientificTypes, MLJModelInterface, MLJBase

julia> X = hcat(DataFrame(randn(10, 5), :auto), DataFrame(CategoricalArray.(eachcol(rand(["1", "2", "3", "4"], 10, 5))), :auto); makeunique=true);

julia> schema(X)
┌───────┬───────────────┬──────────────────────────────────┐
│ names │ scitypes      │ types                            │
├───────┼───────────────┼──────────────────────────────────┤
│ x1    │ Continuous    │ Float64                          │
│ x2    │ Continuous    │ Float64                          │
│ x3    │ Continuous    │ Float64                          │
│ x4    │ Continuous    │ Float64                          │
│ x5    │ Continuous    │ Float64                          │
│ x1_1  │ Multiclass{4} │ CategoricalValue{String, UInt32} │
│ x2_1  │ Multiclass{4} │ CategoricalValue{String, UInt32} │
│ x3_1  │ Multiclass{3} │ CategoricalValue{String, UInt32} │
│ x4_1  │ Multiclass{4} │ CategoricalValue{String, UInt32} │
│ x5_1  │ Multiclass{4} │ CategoricalValue{String, UInt32} │
└───────┴───────────────┴──────────────────────────────────┘

julia> typeof(MLJModelInterface.matrix(X))
Matrix{Any} (alias for Array{Any, 2})

The MLJModelInterface.matrix(X) is how MLJ takes training data and passes it over to MLJLinearModels; as you can see the output is an un-typed matrix because it's got columns of strings with "1", "2" etc.

TLDR: use an encoder then pass to the linear regressor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants