-
Notifications
You must be signed in to change notification settings - Fork 284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid model overfitting #31
Comments
After fixed some data processing issue, my current performance is: I found that the model has 116,531,713 trainable parameters. So I thought maybe the network is too big and remembers even 120,000 training examples. However, as the ROCStories only has less than 2000 examples, it doesn't get overfit. I don't know why my own data will get overfitting. |
Had the same issue with imdb sentiment analysis. Would appreciate some pointers here... |
@rodgzilla ok will do that. What about increasing the dropout probability in the classification head? Would it help to increase it? |
Have you found any good way to regularize the network? |
I have similar issue. I am training distilbert model after cleaning ISOT fake news dataset I am getting 99% validation accuracy after 1 epoch. It is predicting wrong labels on unseen data. I guess model is just remembering the input sequence and its clearly overfitting. So, How can I regularize it? |
Add smoothing and dropouts |
Any answer on this. How to avoid overfitting on smaller data-points. Is |
I adapted this model to a text classification problem, where my text is concated as:
[start] text1 [delimiter] text2 [delimiter] text3 [classify]
and it is just a binary classification problem. So use F.softmax for the model output and use BCE loss.
I have 120,000 training examples, and 10,000 evaluation examples. n_ctx is set to be 500. For one epoch, it takes about 7 hours (1 GPU).
When I use lm_coef = 0.5, I found that the training accuracy on my training dataset is 0.9, but dev accuracy is just 0.66. More epochs doesn't improve the accuracy for evaluation dataset.
So this is exactly overfitting. I am looking for suggestions about what I can tune to make it not overfit, in either model of training settings?
The text was updated successfully, but these errors were encountered: