You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a question about the section "3.8 Using Additional Training Data" from your paper "Visual Speech Recognition for Multiple Languages in the Wild"
For example, for LRS3 the best WER of 32.1 is achieved by combining the datasets LRW + LRS2 + AVSpeech + LRS3. I was just wondering what way they're combined during training, which of the scenarios would be correct?
Scenario A:
Pretrain using LRW + LRS2 + AVSpeech datasets
Initialise from 1 above, then train on the LRS3 dataset only
Scenario B:
Pretrain using LRW + LRS2 + AVSpeech datasets
Initialise from 1 above, then train on LRW + LRS2 + AVSpeech + LRS3 datasets
Would there be a performance difference between these 2 scenarios?
Thanks
The text was updated successfully, but these errors were encountered:
Hi, thanks for this great work.
I have a question about the section
"3.8 Using Additional Training Data"
from your paper"Visual Speech Recognition for Multiple Languages in the Wild"
For example, for LRS3 the best WER of 32.1 is achieved by combining the datasets LRW + LRS2 + AVSpeech + LRS3. I was just wondering what way they're combined during training, which of the scenarios would be correct?
Scenario A:
Scenario B:
Would there be a performance difference between these 2 scenarios?
Thanks
The text was updated successfully, but these errors were encountered: