Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'Voice Conversion' paper candidate 2411.19770 #667

Open
github-actions bot opened this issue Dec 2, 2024 · 0 comments
Open

'Voice Conversion' paper candidate 2411.19770 #667

github-actions bot opened this issue Dec 2, 2024 · 0 comments

Comments

@github-actions
Copy link
Contributor

github-actions bot commented Dec 2, 2024

Please check whether this paper is about 'Voice Conversion' or not.

article info.

  • title: Noro: A Noise-Robust One-shot Voice Conversion System with Hidden Speaker Representation Capabilities

  • summary: One-shot voice conversion (VC) aims to alter the timbre of speech from a
    source speaker to match that of a target speaker using just a single reference
    speech from the target, while preserving the semantic content of the original
    source speech. Despite advancements in one-shot VC, its effectiveness decreases
    in real-world scenarios where reference speeches, often sourced from the
    internet, contain various disturbances like background noise. To address this
    issue, we introduce Noro, a Noise Robust One-shot VC system. Noro features
    innovative components tailored for VC using noisy reference speeches, including
    a dual-branch reference encoding module and a noise-agnostic contrastive
    speaker loss. Experimental results demonstrate that Noro outperforms our
    baseline system in both clean and noisy scenarios, highlighting its efficacy
    for real-world applications. Additionally, we investigate the hidden speaker
    representation capabilities of our baseline system by repurposing its reference
    encoder as a speaker encoder. The results shows that it is competitive with
    several advanced self-supervised learning models for speaker representation
    under the SUPERB settings, highlighting the potential for advancing speaker
    representation learning through one-shot VC task.

  • id: http://arxiv.org/abs/2411.19770v1

judge

Write [vclab::confirmed] or [vclab::excluded] in comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

0 participants