Deepspeech full explaination #606

JRMeyer · 2021-03-08T03:04:42Z

JRMeyer
Mar 8, 2021
Maintainer

>>> sanjay.pandey
[April 25, 2019, 5:50am]

Can you give me any source or paper or link which explain mozilla
Deepspeech fully i have already gone through your WER slash < 10 blog but i
want more detail as how acoustic model and language model works.

I want to make speech recognition for restaurant domain in which mainly
i need my model to understand every menu items be it indian or
continental or any dish and also want my model to understand phone
number and basically our customer will be of indian accent.

Things i have done until now

1. Trained further deepspeech 0.4.1 on mozilla common voice english
train-valid-dataset. Final loss which i got after training on 35
epoch was 0.8 and when i did inference after including only vocab
which consisted on train-valid-dataset in language model and spoke
the same word which consisted in language model it gives awesome
result even in noise so i tried making custom language model where i
included different food items in language model and also number
'zero to nine' per line. slash
When i did inference on that the result was not good for example slash
instead of 'three cheers chocolate' which i included on language
model it took it as 'three cold' when i spoke and the cold comes
from word 'cold coffee' which i included in language model.Even i
increased lm_alpha and lm_beta and beam width yet no change.

So i am thinking to train on indian accent speaking those above words
having around 300 hours of data and then including the same on language
model. slash
I want to ask will that improve the inference? or is there any other
way? or i need to go more in depth to understand it better? slash
if there is another way to improve the inference do tell me.

[This is an archived TTS discussion thread from discourse.mozilla.org/t/deepspeech-full-explaination]

JRMeyer · 2021-03-08T03:04:45Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> kdavis
[April 25, 2019, 7:15am]

The best place to start reading is likely the original paper from Baidu
Deep Speech: Scaling up end-to-end speech
recognition. The 'core' of our model
is similar to what's described there.

What might be more effective for lm_alpha and lm_beta, instead of just
marking them larger, is to do a 'grid search' on their possible values
to find which pair of values gives the best results in your use case.

Generally training, or fine tuning, on more data that's similar to the
end use case will improve results.

However, the question is which is easier tuning lm_alpha and lm_beta or
training on new data. Given the choice I'd try to tune lm_alpha and
lm_beta first, as that's usually easier.

[Archived Post]

0 replies

JRMeyer · 2021-03-08T03:04:47Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> alchemi5t
[July 19, 2019, 4:06am]

> 'grid search' on their possible values

what possible values should we try in grid search? from what i
understand, alpha is used as lm weighting parameter, so maybe a grid
with .05 delta, e.g., slash [0.65,0.70,0.75,0.8,0.85 slash ] ; something like this.
And then, beta is the reward param? I dont completely understand this.
what kinda range should you try out for beta? also what is the expected
behavior for a certain beta value, i mean, what should the model do if
the beta is higher than the default(1.85), at default and lower?

[Archived Post]

0 replies

JRMeyer · 2021-03-08T03:04:50Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> kdavis
[July 19, 2019, 6:03am]

I'd maybe start with the default values, double them for an upper bound,
half them for a lower bound, then do the search.

If you find a minimum in that range, you're done. If not, continue the
search in whichever direction the previous experiment indicated the
search should continue, i.e. in whichever direction the WER decreased
but didn't reach a minimum.

[Archived Post]

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deepspeech full explaination #606

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Deepspeech full explaination #606

JRMeyer Mar 8, 2021 Maintainer

Replies: 3 comments

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer

JRMeyer
Mar 8, 2021
Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer Author