Can you please add some more information/guidelines about how to integrate LM for decoding #9

dsohum · 2018-02-16T03:55:12Z

I am a newbie exploring attention based models and your work has been of great help in understanding some existing architectures. I would be grateful if you could put up some information/guidelines on how to use lm with a tf model in your code. (like what input format is expected or how to use make_fst to construct one)
Thanks
Best Regards

The text was updated successfully, but these errors were encountered:

vagrawal · 2018-02-16T06:53:07Z

I know the documentation for constructing FST is missing, in part because it sometimes requires special processing beyond standard commands, and both kaldi and openfst are needed to create one. The make_fst file is intended to be read and run line by line.

If you have successfully installed kaldi and openfst, let me know if you still find any difficulty. Also, if you want FST for the CMU Sphinx LM, I will upload it here.

joshinh · 2018-02-16T19:03:37Z

I have run the script make_fst and generated the LG.fst file as suggested. But I'm having some problem in incorporating it in the system. The read_data_thread function in data.py checks whether the function in_fst(fst,text) returns true when the use_train_lm flag is set to True. Unfortunately that function is returning false for all of my transcription texts. Any idea why this might be happening? (The program works fine on the same transcription file without the language model)

vagrawal · 2018-02-16T21:26:36Z

It happens when the LM does not matches the transcription, i.e. there is at least a word in the transcription which is not in the unigrams in LM. There is nothing that can be done in the training period. You can use LM just for inference to fix this.

Also, this is the setting I have seen in most research papers, and I too am getting the same error but much faster training without using LM in training period.

joshinh · 2018-02-17T06:42:18Z

Thanks a lot for helping!
I have actually made a sample python script to test whether some common words are in the lm. I can see the corresponding one grams in en-70k-0.2-pruned.lm , but the function in_fst returns false for the generated LG.fst . Are there any more specific things/steps I am missing?
(One change I have made is in vocab.py to incorporate lower case letters. Can that be the problem?)

dsohum · 2018-02-17T15:05:51Z

Thanks for your time. I used the openly available librispeech 3gram LM and followed make_fst to construct the FST but the lm-scores are -50 (default value) mostly. I was not sure how to handle <unk> token, have ignored it for now. What am I missing? Can you please help here?
It would be great if you could share the FST for the CMU Sphinx LM. (along with the vocab.py used to create it)
Thanks

vagrawal · 2018-02-17T19:06:14Z

I think the problem might be that a space is expected after the sentences in transcript. It was my fault as I had tested it only in my system and failed to observe that it was an requirement. I will add space if the sentence is ending without one here if it fixes your problem: data.py.

Thanks for reporting the bug.

joshinh · 2018-02-17T19:09:53Z

Indeed that was the problem! It is working now. Thanks a lot for helping out.

vagrawal · 2018-02-17T20:40:50Z

Thank you both for reporting it. Sorry that it caused you so much trouble. I have changed the code for reading the transcript as not adding the space is the obvious thing to do.

@dsohum Please find my compiled librispeech FST at https://drive.google.com/file/d/1dkExo1bm3fFFl9TBjPEg850zVdiII5tB/view?usp=sharing

dsohum · 2018-03-01T05:46:33Z

I am confused about the lm integration. Does it not involve modifying the BeamSearchDecoder? Since the modified log prob scores from the LMCellWrpper get normalized after softmax in the BeamSearchDecoder. Ideally, the BeamSearchDecoder should use cell_output directly from LMCellWrapper and not log_softmax(cell_output).
Can you please help me understand how this works out?
Thanks!

vagrawal · 2018-03-01T18:47:15Z

The log_softmax function just adds a scalar value to the vector. I think using that at last will give an almost same result, if not a slight improvement as then the beam search score will be log of probability till now.

I get what you are saying. To match the more common setting from several papers, we have to change BeamSearchDecoder to remove the log_softmax call. If you think that that approach will work better, doing a full run with both methods is the only way to go.

dsohum · 2018-04-26T13:26:24Z

I was trying to reproduce the experiment for WSJ dataset. But I am only getting 40% WER by using this repo as-is. Am I missing something?
I am not using LM though. Can you please specify the hyper parameter settings and the training schedule that was used to get 15% WER?
Thanks

vagrawal · 2018-04-28T03:33:58Z

Sorry for being late in reply. 40% WER seems little bit too high. I get around 17-21% WER using code as is.

Can you post your tensorboard output from:

tensorboard --logdir <checkpoint-path>

dsohum · 2018-04-28T13:06:15Z

Thanks for replying! I ran the code for 16 epoch as specified by the repo (the val loss seemed to saturate). Is there any pretraining etc required?
I did changed the code to compute WER as Total-num-corrections/Total-num-words-in-transcript as we often do for ASR. (No other changes)

Thanks!

vagrawal · 2018-04-28T14:13:12Z

I changed that in my branch too, but that is not the problem. In fact, it reduces the WER by 0.5% or so. The validation loss should go below 0.2(at least in my case). I think it's possible that loss has not been converged. You can try to continue from last checkpoint by using --checkpoint-path(just use basename without extension) and run at least 5-10 more epochs, to see if the loss has really been converged.

dsohum · 2018-05-01T20:06:24Z

Validation loss seems to converge to ~0.25 after 41 epochs. Getting 31% WER. Should I be running it for more epochs? Should I change batch-size for training?

Thanks!

vagrawal · 2018-05-01T22:09:50Z

It is very hard for me to tell the problem. While 31% is much better than before, it is still too much even for base system. Ideally you should get loss around 0.15 and WER around 20%(without LM). Usually how I train is, I set lr decay to 1.0(no decay) and manually decrease lr by 10, once it converges. I don't know if this is the reason for disparity. While the code is little different for this, here is one of my run, which got to validation loss of 0.142 and best WER of 17.74%:

I have not pushed my current code as it has became completely unorganized and it will take lot of work to make it usable to others. But still I remember I got around 20% WER when I made this repo, using the code for this branch.

vagrawal added a commit that referenced this issue Feb 17, 2018

Fix #9

d4768d1

vagrawal closed this as completed in 5fa7a2e Feb 17, 2018

vagrawal reopened this Mar 1, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can you please add some more information/guidelines about how to integrate LM for decoding #9

Can you please add some more information/guidelines about how to integrate LM for decoding #9

dsohum commented Feb 16, 2018

vagrawal commented Feb 16, 2018

joshinh commented Feb 16, 2018

vagrawal commented Feb 16, 2018

joshinh commented Feb 17, 2018 •

edited

Loading

dsohum commented Feb 17, 2018 •

edited

Loading

vagrawal commented Feb 17, 2018

joshinh commented Feb 17, 2018

vagrawal commented Feb 17, 2018

dsohum commented Mar 1, 2018

vagrawal commented Mar 1, 2018

dsohum commented Apr 26, 2018

vagrawal commented Apr 28, 2018

dsohum commented Apr 28, 2018 •

edited

Loading

vagrawal commented Apr 28, 2018

dsohum commented May 1, 2018

vagrawal commented May 1, 2018 •

edited

Loading

Can you please add some more information/guidelines about how to integrate LM for decoding #9

Can you please add some more information/guidelines about how to integrate LM for decoding #9

Comments

dsohum commented Feb 16, 2018

vagrawal commented Feb 16, 2018

joshinh commented Feb 16, 2018

vagrawal commented Feb 16, 2018

joshinh commented Feb 17, 2018 • edited Loading

dsohum commented Feb 17, 2018 • edited Loading

vagrawal commented Feb 17, 2018

joshinh commented Feb 17, 2018

vagrawal commented Feb 17, 2018

dsohum commented Mar 1, 2018

vagrawal commented Mar 1, 2018

dsohum commented Apr 26, 2018

vagrawal commented Apr 28, 2018

dsohum commented Apr 28, 2018 • edited Loading

vagrawal commented Apr 28, 2018

dsohum commented May 1, 2018

vagrawal commented May 1, 2018 • edited Loading

joshinh commented Feb 17, 2018 •

edited

Loading

dsohum commented Feb 17, 2018 •

edited

Loading

dsohum commented Apr 28, 2018 •

edited

Loading

vagrawal commented May 1, 2018 •

edited

Loading