-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing Imbalanced Data #1
Comments
In case of Hindi data, there are surely many 'O' entries. Fixing this is not entirely possible, as we would have to go through entire dataset, or create a new one(that is extreme task). We can only make some assumptions, like using only those sentences which have some certain specific number of named entities, using sentences with max_len <= threshold, etc. |
That image suggests that you are surely doing something wrong. |
I tried to run the script with default settings, as it can be found in english_NER.ipnb. |
Sorry for late replying. |
The problem is version numbers. I should have made requirements.txt. |
I got a similar issue that all are predicted to be 'O' for the English dataset, but my issue is even worse as the losses are all nan since the beginning. I will try to match the versions of Keras and tensorflow. Do you have any other advice on this issue? Thanks. |
A follow-up with that, it does not seem that the versions of tesorflow and Keras can solve my loss: nan issues. I am wondering if this is due to the gpu vs cpu? |
It turns out that I get the same answer with @ArmandGiraud right now. @pandeydivesh15 what was the accuracy? |
Still having this |
The NER corpus include many more 'O' label than any entities.
How can we fix this using keras?
I tried sample_weight to ajust the loss function during training, but it does not appear to fix the problem fully. What would you suggest?
Thx
The text was updated successfully, but these errors were encountered: