Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

one hot for new data : what to do if new unseen in training data set categories are located in test data #39

Open
Sandy4321 opened this issue Feb 26, 2021 · 1 comment

Comments

@Sandy4321
Copy link

great code thanks
Practical-Deep-Learning-for-Coders-2.0/Tabular Notebooks/02_Bayesian_Optimization.ipynb

only can you clarify what to do with tabular data embedding's if new unseen in training data set categories are located in test data

it is very practicable case when new observation are located in data for prediction

for example feature month has only January and February on train data
but there are march and June in test data?

there is not one hot representation for new data
does it mean new values will be converted to all 0s or code just will crush ?

@Sandy4321
Copy link
Author

may be it is treated atomically as written here

the extra dimension is used when encountering a previously unseen value

https://medium.com/codon-consulting/using-entity-embeddings-with-fastai-v1-and-v2-fa4ba0d80105

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant