You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please check whether this paper is about 'Voice Conversion' or not.
article info.
title: Improving Voice Quality in Speech Anonymization With Just Perception-Informed Losses
summary: The increasing use of cloud-based speech assistants has heightened the need
for effective speech anonymization, which aims to obscure a speaker's identity
while retaining critical information for subsequent tasks. One approach to
achieving this is through voice conversion. While existing methods often
emphasize complex architectures and training techniques, our research
underscores the importance of loss functions inspired by the human auditory
system. Our proposed loss functions are model-agnostic, incorporating
handcrafted and deep learning-based features to effectively capture quality
representations. Through objective and subjective evaluations, we demonstrate
that a VQVAE-based model, enhanced with our perception-driven losses, surpasses
the vanilla model in terms of naturalness, intelligibility, and prosody while
maintaining speaker anonymity. These improvements are consistently observed
across various datasets, languages, target speakers, and genders.
Please check whether this paper is about 'Voice Conversion' or not.
article info.
title: Improving Voice Quality in Speech Anonymization With Just Perception-Informed Losses
summary: The increasing use of cloud-based speech assistants has heightened the need
for effective speech anonymization, which aims to obscure a speaker's identity
while retaining critical information for subsequent tasks. One approach to
achieving this is through voice conversion. While existing methods often
emphasize complex architectures and training techniques, our research
underscores the importance of loss functions inspired by the human auditory
system. Our proposed loss functions are model-agnostic, incorporating
handcrafted and deep learning-based features to effectively capture quality
representations. Through objective and subjective evaluations, we demonstrate
that a VQVAE-based model, enhanced with our perception-driven losses, surpasses
the vanilla model in terms of naturalness, intelligibility, and prosody while
maintaining speaker anonymity. These improvements are consistently observed
across various datasets, languages, target speakers, and genders.
id: http://arxiv.org/abs/2410.15499v1
judge
Write [vclab::confirmed] or [vclab::excluded] in comment.
The text was updated successfully, but these errors were encountered: