This is the repository of Visual Speech Recognition for Multiple Languages, which is the successor of End-to-End Audio-Visual Speech Recognition with Conformers. The repository is mainly based on ESPnet. We provide state-of-the-art algorithms for end-to-end visual speech recognition in the wild.