-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can you please add some more information/guidelines about how to integrate LM for decoding #9
Comments
I know the documentation for constructing FST is missing, in part because it sometimes requires special processing beyond standard commands, and both kaldi and openfst are needed to create one. The If you have successfully installed kaldi and openfst, let me know if you still find any difficulty. Also, if you want FST for the CMU Sphinx LM, I will upload it here. |
I have run the script make_fst and generated the LG.fst file as suggested. But I'm having some problem in incorporating it in the system. The read_data_thread function in data.py checks whether the function in_fst(fst,text) returns true when the use_train_lm flag is set to True. Unfortunately that function is returning false for all of my transcription texts. Any idea why this might be happening? (The program works fine on the same transcription file without the language model) |
It happens when the LM does not matches the transcription, i.e. there is at least a word in the transcription which is not in the unigrams in LM. There is nothing that can be done in the training period. You can use LM just for inference to fix this. Also, this is the setting I have seen in most research papers, and I too am getting the same error but much faster training without using LM in training period. |
Thanks a lot for helping! |
Thanks for your time. I used the openly available librispeech 3gram LM and followed |
I think the problem might be that a space is expected after the sentences in transcript. It was my fault as I had tested it only in my system and failed to observe that it was an requirement. I will add space if the sentence is ending without one here if it fixes your problem: data.py. Thanks for reporting the bug. |
Indeed that was the problem! It is working now. Thanks a lot for helping out. |
Thank you both for reporting it. Sorry that it caused you so much trouble. I have changed the code for reading the transcript as not adding the space is the obvious thing to do. @dsohum Please find my compiled librispeech FST at https://drive.google.com/file/d/1dkExo1bm3fFFl9TBjPEg850zVdiII5tB/view?usp=sharing |
I am confused about the lm integration. Does it not involve modifying the BeamSearchDecoder? Since the modified log prob scores from the LMCellWrpper get normalized after softmax in the BeamSearchDecoder. Ideally, the BeamSearchDecoder should use cell_output directly from LMCellWrapper and not log_softmax(cell_output). |
The I get what you are saying. To match the more common setting from several papers, we have to change |
I was trying to reproduce the experiment for WSJ dataset. But I am only getting 40% WER by using this repo as-is. Am I missing something? |
Sorry for being late in reply. 40% WER seems little bit too high. I get around 17-21% WER using code as is. Can you post your tensorboard output from:
|
I changed that in my branch too, but that is not the problem. In fact, it reduces the WER by 0.5% or so. The validation loss should go below 0.2(at least in my case). I think it's possible that loss has not been converged. You can try to continue from last checkpoint by using |
It is very hard for me to tell the problem. While 31% is much better than before, it is still too much even for base system. Ideally you should get loss around 0.15 and WER around 20%(without LM). Usually how I train is, I set lr decay to 1.0(no decay) and manually decrease lr by 10, once it converges. I don't know if this is the reason for disparity. While the code is little different for this, here is one of my run, which got to validation loss of 0.142 and best WER of 17.74%: I have not pushed my current code as it has became completely unorganized and it will take lot of work to make it usable to others. But still I remember I got around 20% WER when I made this repo, using the code for this branch. |
I am a newbie exploring attention based models and your work has been of great help in understanding some existing architectures. I would be grateful if you could put up some information/guidelines on how to use lm with a tf model in your code. (like what input format is expected or how to use
make_fst
to construct one)Thanks
Best Regards
The text was updated successfully, but these errors were encountered: