Fixing Imbalanced Data #1

ArmandGiraud · 2017-05-25T13:28:36Z

The NER corpus include many more 'O' label than any entities.
How can we fix this using keras?
I tried sample_weight to ajust the loss function during training, but it does not appear to fix the problem fully. What would you suggest?
Thx

pandeydivesh15 · 2017-05-25T13:37:50Z

In case of Hindi data, there are surely many 'O' entries. Fixing this is not entirely possible, as we would have to go through entire dataset, or create a new one(that is extreme task). We can only make some assumptions, like using only those sentences which have some certain specific number of named entities, using sentences with max_len <= threshold, etc.
I dont understand fixing this by keras. Can you explain more?

ArmandGiraud · 2017-05-25T14:04:45Z

Actually that was unclear from me, when I try to train the model on the english conll dataset, The classifier only predicts '0' label, and this yields a high accuracy (around 97%).

Maybe I'm just doing something wrong, but i don't see what.
I already encountered class imbalances in other ML cases. But I'm wondering if there is any preferred solution for the NER problem.
There are many ways of addressing this problem, (such as oversampling, undersampling or SMOTE)
or some solutions within the keras options such as setting class weights in the loss function.

pandeydivesh15 · 2017-05-25T14:20:29Z

That image suggests that you are surely doing something wrong.
How was the output when you were training the model using keras. Was valid. acc increasing steadily(at a optimum rate) and loss decreasing at a good rate?
For handling class imbalances, you can do something like I told in previous comment.

ArmandGiraud · 2017-05-25T18:51:28Z

I tried to run the script with default settings, as it can be found in english_NER.ipnb.
The accuracy (and logloss) is stuck at 97.3 from the first epoch.
I'm trying to figure out what is going wrong.

pandeydivesh15 · 2017-05-29T07:52:40Z

Sorry for late replying.
Were you also getting very low loss (in negative powers of 10) and NaN values during training?

ArmandGiraud · 2017-05-29T20:55:46Z

Hello Divesh,
I have a very low loss from the first epoch, I joined a capture of training logs:

The only thing I changed was adding a few parenthis to print functions since I'm running your scripts with python 3, maybe I'm also using a different version of Keras tensorflow, I have keras 2.0.0 and tensorflow 1.0.1 installed on windows 64, which versions did you use initially?
Thanks for helping

pandeydivesh15 · 2017-05-30T03:02:24Z

The problem is version numbers. I should have made requirements.txt.
I used Keras==1.2.1 and tensorflow-gpu==0.12.1. Though I had tensorflow with GPU support, you can avoid that by installing just tensorflow==0.12.1. Try this in a new env and let me know.
About using python 3, some problems can occur while handling unicodes, but in our case, chances are less.

jenniferzhu · 2017-08-21T01:41:45Z

I got a similar issue that all are predicted to be 'O' for the English dataset, but my issue is even worse as the losses are all nan since the beginning. I will try to match the versions of Keras and tensorflow. Do you have any other advice on this issue? Thanks.

jenniferzhu · 2017-08-21T01:48:33Z

A follow-up with that, it does not seem that the versions of tesorflow and Keras can solve my loss: nan issues. I am wondering if this is due to the gpu vs cpu?

jenniferzhu · 2017-08-27T18:14:03Z

It turns out that I get the same answer with @ArmandGiraud right now. @pandeydivesh15 what was the accuracy?

pandeydivesh15 · 2017-08-27T19:20:44Z

I trained one model just now. Output in my case:

sayantanbbb · 2022-01-07T11:23:33Z

Still having this

pandeydivesh15 closed this as completed Aug 27, 2017

pandeydivesh15 reopened this Aug 27, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing Imbalanced Data #1

Fixing Imbalanced Data #1

ArmandGiraud commented May 25, 2017

pandeydivesh15 commented May 25, 2017

ArmandGiraud commented May 25, 2017

pandeydivesh15 commented May 25, 2017

ArmandGiraud commented May 25, 2017

pandeydivesh15 commented May 29, 2017

ArmandGiraud commented May 29, 2017

pandeydivesh15 commented May 30, 2017

jenniferzhu commented Aug 21, 2017

jenniferzhu commented Aug 21, 2017

jenniferzhu commented Aug 27, 2017

pandeydivesh15 commented Aug 27, 2017

sayantanbbb commented Jan 7, 2022 •

edited

Loading

Fixing Imbalanced Data #1

Fixing Imbalanced Data #1

Comments

ArmandGiraud commented May 25, 2017

pandeydivesh15 commented May 25, 2017

ArmandGiraud commented May 25, 2017

pandeydivesh15 commented May 25, 2017

ArmandGiraud commented May 25, 2017

pandeydivesh15 commented May 29, 2017

ArmandGiraud commented May 29, 2017

pandeydivesh15 commented May 30, 2017

jenniferzhu commented Aug 21, 2017

jenniferzhu commented Aug 21, 2017

jenniferzhu commented Aug 27, 2017

pandeydivesh15 commented Aug 27, 2017

sayantanbbb commented Jan 7, 2022 • edited Loading

sayantanbbb commented Jan 7, 2022 •

edited

Loading