Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add capability to handle categorical covariates #29

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

melhemr
Copy link
Contributor

@melhemr melhemr commented Nov 11, 2022

No description provided.

@melhemr melhemr marked this pull request as draft November 11, 2022 20:31
Copy link
Contributor

@AbdulkadirA AbdulkadirA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@melhemr This PR allows using categorical variables but it is not clear how it behaves and when it will fail. Can you elaborate a bit on the scope of the PR. Sepecifically,

  • In the case of a binary categorical variable; what is the difference to manually coding it 0/1 and treat it as numerical?
  • What happens when the categories (count and levels) don't match between learn and apply?
  • What if in the data set in the apply function contains only a subset of categories?
  • What if the same categories are supplied, but the variable type is e.g. string and not categorical? Will it work correctly? What if the ordering is different?

If these issues are not carefully considered, I don't recommend to merge this PR as it may lead to unexpected behavior, incorrect results, and other issues.

@rpomponio
Copy link
Owner

@AbdulkadirA 's comments are all very relevant. I'm guessing some internal data structure will need to keep track of the original category levels and the encoding (i.e., the way values were mapped to numbers). Then, the apply stage will need to compare the new data to the original data and reconcile differences. The Sci-Kit Learn method preprocessing.OneHotEncoder could be useful here.

Sorry I'm just getting around to seeing this. I greatly appreciate your work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants