You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please check whether this paper is about 'Voice Conversion' or not.
article info.
title: Noro: A Noise-Robust One-shot Voice Conversion System with Hidden Speaker Representation Capabilities
summary: One-shot voice conversion (VC) aims to alter the timbre of speech from a
source speaker to match that of a target speaker using just a single reference
speech from the target, while preserving the semantic content of the original
source speech. Despite advancements in one-shot VC, its effectiveness decreases
in real-world scenarios where reference speeches, often sourced from the
internet, contain various disturbances like background noise. To address this
issue, we introduce Noro, a Noise Robust One-shot VC system. Noro features
innovative components tailored for VC using noisy reference speeches, including
a dual-branch reference encoding module and a noise-agnostic contrastive
speaker loss. Experimental results demonstrate that Noro outperforms our
baseline system in both clean and noisy scenarios, highlighting its efficacy
for real-world applications. Additionally, we investigate the hidden speaker
representation capabilities of our baseline system by repurposing its reference
encoder as a speaker encoder. The results shows that it is competitive with
several advanced self-supervised learning models for speaker representation
under the SUPERB settings, highlighting the potential for advancing speaker
representation learning through one-shot VC task.
Please check whether this paper is about 'Voice Conversion' or not.
article info.
title: Noro: A Noise-Robust One-shot Voice Conversion System with Hidden Speaker Representation Capabilities
summary: One-shot voice conversion (VC) aims to alter the timbre of speech from a
source speaker to match that of a target speaker using just a single reference
speech from the target, while preserving the semantic content of the original
source speech. Despite advancements in one-shot VC, its effectiveness decreases
in real-world scenarios where reference speeches, often sourced from the
internet, contain various disturbances like background noise. To address this
issue, we introduce Noro, a Noise Robust One-shot VC system. Noro features
innovative components tailored for VC using noisy reference speeches, including
a dual-branch reference encoding module and a noise-agnostic contrastive
speaker loss. Experimental results demonstrate that Noro outperforms our
baseline system in both clean and noisy scenarios, highlighting its efficacy
for real-world applications. Additionally, we investigate the hidden speaker
representation capabilities of our baseline system by repurposing its reference
encoder as a speaker encoder. The results shows that it is competitive with
several advanced self-supervised learning models for speaker representation
under the SUPERB settings, highlighting the potential for advancing speaker
representation learning through one-shot VC task.
id: http://arxiv.org/abs/2411.19770v1
judge
Write [vclab::confirmed] or [vclab::excluded] in comment.
The text was updated successfully, but these errors were encountered: