You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
summary: Text-to-Speech (TTS) and Voice Conversion (VC) models have exhibited
remarkable performance in generating realistic and natural audio. However,
their dark side, audio deepfake poses a significant threat to both society and
individuals. Existing countermeasures largely focus on determining the
genuineness of speech based on complete original audio recordings, which
however often contain private content. This oversight may refrain deepfake
detection from many applications, particularly in scenarios involving sensitive
information like business secrets. In this paper, we propose SafeEar, a novel
framework that aims to detect deepfake audios without relying on accessing the
speech content within. Our key idea is to devise a neural audio codec into a
novel decoupling model that well separates the semantic and acoustic
information from audio samples, and only use the acoustic information (e.g.,
prosody and timbre) for deepfake detection. In this way, no semantic content
will be exposed to the detector. To overcome the challenge of identifying
diverse deepfake audio without semantic clues, we enhance our deepfake detector
with real-world codec augmentation. Extensive experiments conducted on four
benchmark datasets demonstrate SafeEar's effectiveness in detecting various
deepfake techniques with an equal error rate (EER) down to 2.02%.
Simultaneously, it shields five-language speech content from being deciphered
by both machine and human auditory analysis, demonstrated by word error rates
(WERs) all above 93.93% and our user study. Furthermore, our benchmark
constructed for anti-deepfake and anti-content recovery evaluation helps
provide a basis for future research in the realms of audio privacy preservation
and deepfake detection.
Please check whether this paper is about 'Voice Conversion' or not.
article info.
title: SafeEar: Content Privacy-Preserving Audio Deepfake Detection
summary: Text-to-Speech (TTS) and Voice Conversion (VC) models have exhibited
remarkable performance in generating realistic and natural audio. However,
their dark side, audio deepfake poses a significant threat to both society and
individuals. Existing countermeasures largely focus on determining the
genuineness of speech based on complete original audio recordings, which
however often contain private content. This oversight may refrain deepfake
detection from many applications, particularly in scenarios involving sensitive
information like business secrets. In this paper, we propose SafeEar, a novel
framework that aims to detect deepfake audios without relying on accessing the
speech content within. Our key idea is to devise a neural audio codec into a
novel decoupling model that well separates the semantic and acoustic
information from audio samples, and only use the acoustic information (e.g.,
prosody and timbre) for deepfake detection. In this way, no semantic content
will be exposed to the detector. To overcome the challenge of identifying
diverse deepfake audio without semantic clues, we enhance our deepfake detector
with real-world codec augmentation. Extensive experiments conducted on four
benchmark datasets demonstrate SafeEar's effectiveness in detecting various
deepfake techniques with an equal error rate (EER) down to 2.02%.
Simultaneously, it shields five-language speech content from being deciphered
by both machine and human auditory analysis, demonstrated by word error rates
(WERs) all above 93.93% and our user study. Furthermore, our benchmark
constructed for anti-deepfake and anti-content recovery evaluation helps
provide a basis for future research in the realms of audio privacy preservation
and deepfake detection.
id: http://arxiv.org/abs/2409.09272v1
judge
Write [vclab::confirmed] or [vclab::excluded] in comment.
The text was updated successfully, but these errors were encountered: