Skip to content
This repository has been archived by the owner on Dec 6, 2023. It is now read-only.

Unable to fit Earth() over multiple X variables #225

Closed
kchoX14 opened this issue Feb 9, 2023 · 7 comments
Closed

Unable to fit Earth() over multiple X variables #225

kchoX14 opened this issue Feb 9, 2023 · 7 comments

Comments

@kchoX14
Copy link

kchoX14 commented Feb 9, 2023

I'm using Python v3.7.5 on Windows 10 over Jupyter notebook.

  • What I have is an X array of (m, n) shape, where n = no.of columns/variables from my dataframe & m = no. of rows or data points for each variable. Each element inside the array is also a pandas.core.series.Series instance of (m,) shape.
  • My target variable Y is just another Series instance from pandas with (m,) shape.
  • I need to test the MARS modeling method via pyearth against the statsmodel.api ordinary least squares fitting, where Y is linearly regressed against all the "n" X variables.
  • All entries in my dataframe are of the numpy.float64 data type.
  • Now, when I attempt:
    model = Earth()
    results = model.fit(X,Y)
    print(results)

It returns from line 424 of earth.py:
ValueError: Wrong number of columns in X. Reshape your data

I've even tried the transpose of the arrays X and Y. Nothing works, and the error is the same. Further highlighted, internal errors are:

  1. --> 611 X, y, sample_weight, output_weight, missing)
  2. --> 465 X, missing = self._scrub_x(X, missing, **kwargs)

Question:
What is going on here? Can't pyearth accept multiple variables? MARS is supposed to be "multivariate" adaptive regression splining, after all. Or am I doing something wrong?

@kevin-dietz
Copy link

@kchoX14 If you are refitting your model multiple times you will run into this error -- see this issue #198. This project is largely abandoned so I wouldn't expect any resolution to your question. My advise would be to try a different algorithm to model you data or use the R package, which is still supported.

@kchoX14
Copy link
Author

kchoX14 commented Feb 9, 2023

@kchoX14 If you are refitting your model multiple times you will run into this error -- see this issue #198. This project is largely abandoned so I wouldn't expect any resolution to your question. My advise would be to try a different algorithm to model you data or use the R package, which is still supported.

That is indeed very sad... MARS is powerful for predicting nonlinearity. It's a shame that the code isn't capable of dealing with multiple linear regressions (the foundation of MARS is multivariate). Sigh!

@kevin-dietz
Copy link

@kchoX14 I don't understand your comment. By it's very nature it is multivariate. However, if you are trying to refit the model multiple times, you will need to subset the dataset each time

@kchoX14
Copy link
Author

kchoX14 commented Feb 9, 2023

@kchoX14 I don't understand your comment. By it's very nature it is multivariate. However, if you are trying to refit the model multiple times, you will need to subset the dataset each time

Each row from 1 to m represents a regression in multiple variables (1 to n). So, the ideal requirement from pyearth is that it fits for every such candidate basis and then it gets passed to, I don't know, say a repeated cross-validation against the model. Right now, this only works when n=1. n > 1 fails terribly, and that is the precise requirement to check against multiple linear regression, not its simple sibling.

@kchoX14 kchoX14 closed this as not planned Won't fix, can't repro, duplicate, stale Feb 10, 2023
@kchoX14
Copy link
Author

kchoX14 commented Feb 10, 2023

This project needs to be worked into, in order to fix and provide a fully functional MARS over python.

@bmreiniger
Copy link

It should work just fine with multiple columns in X. Can you provide a minimal reproducible example?

This part confuses me:

Each element inside the array is also a pandas.core.series.Series instance of (m,) shape.

Do you mean your data is really 3D? That is probably not supported; what should the MARS algorithm be doing on such data?

@bmreiniger
Copy link

Reading through the observed error lines, I don't see how this could happen unless as @kevin-dietz says, you fitted the same instance of Earth twice on different data; just make a new instance in that case.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants