Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can you please add some more information/guidelines about how to integrate LM for decoding #9

Open
dsohum opened this issue Feb 16, 2018 · 16 comments

Comments

@dsohum
Copy link

dsohum commented Feb 16, 2018

I am a newbie exploring attention based models and your work has been of great help in understanding some existing architectures. I would be grateful if you could put up some information/guidelines on how to use lm with a tf model in your code. (like what input format is expected or how to use make_fst to construct one)
Thanks
Best Regards

@vagrawal
Copy link
Owner

I know the documentation for constructing FST is missing, in part because it sometimes requires special processing beyond standard commands, and both kaldi and openfst are needed to create one. The make_fst file is intended to be read and run line by line.

If you have successfully installed kaldi and openfst, let me know if you still find any difficulty. Also, if you want FST for the CMU Sphinx LM, I will upload it here.

@joshinh
Copy link

joshinh commented Feb 16, 2018

I have run the script make_fst and generated the LG.fst file as suggested. But I'm having some problem in incorporating it in the system. The read_data_thread function in data.py checks whether the function in_fst(fst,text) returns true when the use_train_lm flag is set to True. Unfortunately that function is returning false for all of my transcription texts. Any idea why this might be happening? (The program works fine on the same transcription file without the language model)

@vagrawal
Copy link
Owner

It happens when the LM does not matches the transcription, i.e. there is at least a word in the transcription which is not in the unigrams in LM. There is nothing that can be done in the training period. You can use LM just for inference to fix this.

Also, this is the setting I have seen in most research papers, and I too am getting the same error but much faster training without using LM in training period.

@joshinh
Copy link

joshinh commented Feb 17, 2018

Thanks a lot for helping!
I have actually made a sample python script to test whether some common words are in the lm. I can see the corresponding one grams in en-70k-0.2-pruned.lm , but the function in_fst returns false for the generated LG.fst . Are there any more specific things/steps I am missing?
(One change I have made is in vocab.py to incorporate lower case letters. Can that be the problem?)

@dsohum
Copy link
Author

dsohum commented Feb 17, 2018

Thanks for your time. I used the openly available librispeech 3gram LM and followed make_fst to construct the FST but the lm-scores are -50 (default value) mostly. I was not sure how to handle <unk> token, have ignored it for now. What am I missing? Can you please help here?
It would be great if you could share the FST for the CMU Sphinx LM. (along with the vocab.py used to create it)
Thanks

@vagrawal
Copy link
Owner

I think the problem might be that a space is expected after the sentences in transcript. It was my fault as I had tested it only in my system and failed to observe that it was an requirement. I will add space if the sentence is ending without one here if it fixes your problem: data.py.

Thanks for reporting the bug.

@joshinh
Copy link

joshinh commented Feb 17, 2018

Indeed that was the problem! It is working now. Thanks a lot for helping out.

vagrawal added a commit that referenced this issue Feb 17, 2018
@vagrawal
Copy link
Owner

Thank you both for reporting it. Sorry that it caused you so much trouble. I have changed the code for reading the transcript as not adding the space is the obvious thing to do.

@dsohum Please find my compiled librispeech FST at https://drive.google.com/file/d/1dkExo1bm3fFFl9TBjPEg850zVdiII5tB/view?usp=sharing

@dsohum
Copy link
Author

dsohum commented Mar 1, 2018

I am confused about the lm integration. Does it not involve modifying the BeamSearchDecoder? Since the modified log prob scores from the LMCellWrpper get normalized after softmax in the BeamSearchDecoder. Ideally, the BeamSearchDecoder should use cell_output directly from LMCellWrapper and not log_softmax(cell_output).
Can you please help me understand how this works out?
Thanks!

@vagrawal
Copy link
Owner

vagrawal commented Mar 1, 2018

The log_softmax function just adds a scalar value to the vector. I think using that at last will give an almost same result, if not a slight improvement as then the beam search score will be log of probability till now.

I get what you are saying. To match the more common setting from several papers, we have to change BeamSearchDecoder to remove the log_softmax call. If you think that that approach will work better, doing a full run with both methods is the only way to go.

@vagrawal vagrawal reopened this Mar 1, 2018
@dsohum
Copy link
Author

dsohum commented Apr 26, 2018

I was trying to reproduce the experiment for WSJ dataset. But I am only getting 40% WER by using this repo as-is. Am I missing something?
I am not using LM though. Can you please specify the hyper parameter settings and the training schedule that was used to get 15% WER?
Thanks

@vagrawal
Copy link
Owner

Sorry for being late in reply. 40% WER seems little bit too high. I get around 17-21% WER using code as is.

Can you post your tensorboard output from:

tensorboard --logdir <checkpoint-path>

@dsohum
Copy link
Author

dsohum commented Apr 28, 2018

Thanks for replying! I ran the code for 16 epoch as specified by the repo (the val loss seemed to saturate). Is there any pretraining etc required?
I did changed the code to compute WER as Total-num-corrections/Total-num-words-in-transcript as we often do for ASR. (No other changes)
deepsphinx-as-is-2018
Thanks!

@vagrawal
Copy link
Owner

I changed that in my branch too, but that is not the problem. In fact, it reduces the WER by 0.5% or so. The validation loss should go below 0.2(at least in my case). I think it's possible that loss has not been converged. You can try to continue from last checkpoint by using --checkpoint-path(just use basename without extension) and run at least 5-10 more epochs, to see if the loss has really been converged.

@dsohum
Copy link
Author

dsohum commented May 1, 2018

Validation loss seems to converge to ~0.25 after 41 epochs. Getting 31% WER. Should I be running it for more epochs? Should I change batch-size for training?
deepsphinx-42-epoch-as-is-2018
Thanks!

@vagrawal
Copy link
Owner

vagrawal commented May 1, 2018

It is very hard for me to tell the problem. While 31% is much better than before, it is still too much even for base system. Ideally you should get loss around 0.15 and WER around 20%(without LM). Usually how I train is, I set lr decay to 1.0(no decay) and manually decrease lr by 10, once it converges. I don't know if this is the reason for disparity. While the code is little different for this, here is one of my run, which got to validation loss of 0.142 and best WER of 17.74%:

untitled

I have not pushed my current code as it has became completely unorganized and it will take lot of work to make it usable to others. But still I remember I got around 20% WER when I made this repo, using the code for this branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants