Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_denoised adding spaces in output so half sentances is not coming in output #17

Open
kbrajwani opened this issue Oct 24, 2019 · 25 comments

Comments

@kbrajwani
Copy link

[AM] Torserane TVcom in Bankiner wo Finnnce
[D ] T o r e v e n g e T o m i n B a

see am is full sentance but [D ] is cutting down sentance. tell me where can i change the code to get correct output.

@jonomon
Copy link
Contributor

jonomon commented Oct 24, 2019

Hi kbrajwani,
[AM] is the correct sentence?

@kbrajwani
Copy link
Author

AM is not the correct sentence but my concern is not about correct prediction but i want to know why [D] is adding spaces in character.
Sometimes [D] is giving correct prediction

@jonomon
Copy link
Contributor

jonomon commented Oct 24, 2019

Which code are you running and on what dataset?

@kbrajwani
Copy link
Author

Latest code of your repository with pretrained models.

@kbrajwani
Copy link
Author

Tesing on iam dataset images.

@jonomon
Copy link
Contributor

jonomon commented Oct 24, 2019

are all predictions like this or this particular one?

@kbrajwani
Copy link
Author

No, only first and last line of prediction is come like that.

@jonomon
Copy link
Contributor

jonomon commented Oct 24, 2019

Did you make any changes?
I am do not see the lines you mentioned in https://github.com/awslabs/handwritten-text-recognition-for-apache-mxnet/blob/master/0_handwriting_ocr.ipynb.

@kbrajwani
Copy link
Author

Yes i made lots of change in your code,
See following code generating output which i am talking about
decoded_line_denoiser = get_denoised(line_character_probs, ctc_bs=False)
print("[D ]",decoded_line_denoiser)

@jonomon
Copy link
Contributor

jonomon commented Oct 24, 2019

This file presents the methods the denoiser was trained. The data was modelled after the noise associated with the previous steps of our model.

Judging from your example output "Torserane TVcom in Bankiner wo Finnnce", the noise seems quite different from our model and the words are not really recognisable. Most likely the pretrained denoiser wont work well.

I found that the better the handwriting recognition, the better the denoiser. It might be beneficial for you to first improve the handwriting (i.e., focus on improving the output [AM]) then work on the denoiser.

Please note that the output for our handwriting recognition is "Can't go lighting bonfites on this bus," and the denoiser only changed bonfites to bonfires.

@kbrajwani
Copy link
Author

Screenshot (7)

see in this image print(generator.generate_sequences(inputs, states, sentence)) this line is generating output like this
This sentnce has an eror
Choice
T h i s s e n t e n c e

output last line is putting one space after every character its your notebook file

@jonomon
Copy link
Contributor

jonomon commented Nov 12, 2019

Did you edit any of the functions (denoiser.encode, generator.generate_sequences etc)?

@devbaseh
Copy link

devbaseh commented Jan 7, 2020

Same Issue With me, the final prediction has spaces and gives only half of the sentence like this,

"This sentnce has an eror
Choice
T h i s s e n t e n c e "

I have just changed
"ctx_nlp = mx.gpu(3)" to "ctx_nlp = mx.gpu(0) if mx.context.num_gpus() > 0 else mx.cpu()"
I couldn't figure out why this happens...

@raghav-menon
Copy link

raghav-menon commented Mar 12, 2020

Hi, I am having the same problem too. Does this require more than one GPU to do the job. If I am not wrong ctx_nlp = mx.gpu(3) command indicates that the variable is being assigned to the 4th GPU in case there are 4 GPUs. If the line is left as such it gives out an error telling that the number should be 1 less than the number of GPU devices. Since I have only one, I had assigned it as ctx_nlp = mx.gpu(0). But that in turn cuts out half the sentence adding a space between the alphabets as already mentioned by a few here. Not sure whether it is exactly a GPU problem. Was wondering whether this can be run on a CPU rather !! Any help is appreciated!!

@jonomon
Copy link
Contributor

jonomon commented Mar 12, 2020

@raghav-menon
It shouldn't matter whether it's on GPU or CPU.
Are you using the IAM dataset?

@devbaseh
Which notebook are you running?

@raghav-menon
Copy link

@raghav-menon
It shouldn't matter whether it's on GPU or CPU.
Are you using the IAM dataset?

Thank you for the quick reply. I am using the IAM dataset. I have tried using both CPU and GPU and the problem persists. The demoing output only gives partial sentences with a space in between. The system I am using is AWS with 64 GB of RAM and a Tesla K80 attached.

@jonomon
Copy link
Contributor

jonomon commented Mar 13, 2020

@ThomasDelteil ?

@raghav-menon
Copy link

raghav-menon commented Mar 13, 2020

Would be grateful if you could advise me on how to solve it!! Thanks

@ThomasDelteil
Copy link
Contributor

I'll try to have a look this weekend, have you tried retraining from the denoising notebook?

@raghav-menon
Copy link

I'll try to have a look this weekend, have you tried retraining from the denoising notebook?

Thanks Thomas. I have only used the trained model provided and ran the code. Have not tried retraining!!

@mahin003
Copy link

If anybody executed it on Google colab ,please sharethe edited iam_dataset.py it with me , [email protected]

@yangyingxiang
Copy link

Same issue here, didn't modify any code besides changing gpu from 3 to 0:

"This sentnce has an eror
Choice
T h i s s e n t e n c e "

@mahin003
Copy link

mahin003 commented Oct 4, 2020 via email

@jalvathi
Copy link
Contributor

@jonomon @ThomasDelteil Is there any update on the space issue we are getting in denoiser?? An update from you guys will make my day.

Thanks for this wonderful repository. 👍

@jalvathi
Copy link
Contributor

jalvathi commented Jul 24, 2021

@mahin003 @yangyingxiang @raghav-menon @devbaseh @kbrajwani

I had initiated a merge here

I hope, this solves your issues as well. I just got it solved in my code. :)

cc: @jonomon @ThomasDelteil

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants