New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

one hot for new data : what to do if new unseen in training data set categories are located in test data #39

Open

Sandy4321 opened this issue Feb 26, 2021 · 1 comment

Sandy4321 commented Feb 26, 2021

great code thanks
Practical-Deep-Learning-for-Coders-2.0/Tabular Notebooks/02_Bayesian_Optimization.ipynb

only can you clarify what to do with tabular data embedding's if new unseen in training data set categories are located in test data

it is very practicable case when new observation are located in data for prediction

for example feature month has only January and February on train data
but there are march and June in test data?

there is not one hot representation for new data
does it mean new values will be converted to all 0s or code just will crush ?

Author

Sandy4321 commented Feb 26, 2021

may be it is treated atomically as written here

the extra dimension is used when encountering a previously unseen value

https://medium.com/codon-consulting/using-entity-embeddings-with-fastai-v1-and-v2-fa4ba0d80105

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment