Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'Voice Conversion' paper candidate 2409.09272 #646

Open
github-actions bot opened this issue Sep 17, 2024 · 0 comments
Open

'Voice Conversion' paper candidate 2409.09272 #646

github-actions bot opened this issue Sep 17, 2024 · 0 comments

Comments

@github-actions
Copy link
Contributor

Please check whether this paper is about 'Voice Conversion' or not.

article info.

  • title: SafeEar: Content Privacy-Preserving Audio Deepfake Detection

  • summary: Text-to-Speech (TTS) and Voice Conversion (VC) models have exhibited
    remarkable performance in generating realistic and natural audio. However,
    their dark side, audio deepfake poses a significant threat to both society and
    individuals. Existing countermeasures largely focus on determining the
    genuineness of speech based on complete original audio recordings, which
    however often contain private content. This oversight may refrain deepfake
    detection from many applications, particularly in scenarios involving sensitive
    information like business secrets. In this paper, we propose SafeEar, a novel
    framework that aims to detect deepfake audios without relying on accessing the
    speech content within. Our key idea is to devise a neural audio codec into a
    novel decoupling model that well separates the semantic and acoustic
    information from audio samples, and only use the acoustic information (e.g.,
    prosody and timbre) for deepfake detection. In this way, no semantic content
    will be exposed to the detector. To overcome the challenge of identifying
    diverse deepfake audio without semantic clues, we enhance our deepfake detector
    with real-world codec augmentation. Extensive experiments conducted on four
    benchmark datasets demonstrate SafeEar's effectiveness in detecting various
    deepfake techniques with an equal error rate (EER) down to 2.02%.
    Simultaneously, it shields five-language speech content from being deciphered
    by both machine and human auditory analysis, demonstrated by word error rates
    (WERs) all above 93.93% and our user study. Furthermore, our benchmark
    constructed for anti-deepfake and anti-content recovery evaluation helps
    provide a basis for future research in the realms of audio privacy preservation
    and deepfake detection.

  • id: http://arxiv.org/abs/2409.09272v1

judge

Write [vclab::confirmed] or [vclab::excluded] in comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

0 participants