Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'Voice Conversion' paper candidate 2412.02612 #668

Open
github-actions bot opened this issue Dec 4, 2024 · 0 comments
Open

'Voice Conversion' paper candidate 2412.02612 #668

github-actions bot opened this issue Dec 4, 2024 · 0 comments

Comments

@github-actions
Copy link
Contributor

github-actions bot commented Dec 4, 2024

Please check whether this paper is about 'Voice Conversion' or not.

article info.

  • title: GLM-4-Voice: Towards Intelligent and Human-Like End-to-End Spoken Chatbot

  • summary: We introduce GLM-4-Voice, an intelligent and human-like end-to-end spoken
    chatbot. It supports both Chinese and English, engages in real-time voice
    conversations, and varies vocal nuances such as emotion, intonation, speech
    rate, and dialect according to user instructions. GLM-4-Voice uses an ultra-low
    bitrate (175bps), single-codebook speech tokenizer with 12.5Hz frame rate
    derived from an automatic speech recognition (ASR) model by incorporating a
    vector-quantized bottleneck into the encoder. To efficiently transfer knowledge
    from text to speech modalities, we synthesize speech-text interleaved data from
    existing text pre-training corpora using a text-to-token model. We continue
    pre-training from the pre-trained text language model GLM-4-9B with a
    combination of unsupervised speech data, interleaved speech-text data, and
    supervised speech-text data, scaling up to 1 trillion tokens, achieving
    state-of-the-art performance in both speech language modeling and spoken
    question answering. We then fine-tune the pre-trained model with high-quality
    conversational speech data, achieving superior performance compared to existing
    baselines in both conversational ability and speech quality. The open models
    can be accessed through https://github.com/THUDM/GLM-4-Voice and
    https://huggingface.co/THUDM/glm-4-voice-9b.

  • id: http://arxiv.org/abs/2412.02612v1

judge

Write [vclab::confirmed] or [vclab::excluded] in comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

0 participants