Publication: Deepfake Audio Detection: A Deep Learning Based Solution for Group Conversations
Type:
Article
Date
2020-12-10
Journal Title
Journal ISSN
Volume Title
Publisher
2020 2nd International Conference on Advancements in Computing (ICAC), SLIIT
Abstract
The recent advancements in deep learning and other
related technologies have led to improvements in various areas
such as computer vision, bio-informatics, and speech recognition
etc. This research mainly focuses on a problem with synthetic
speech and speaker diarization. The developments in audio have
resulted in deep learning models capable of replicating naturalsounding
voice also known as text-to-speech (TTS) systems. This
technology could be manipulated for malicious purposes such
as deepfakes, impersonation, or spoofing attacks. We propose a
system that has the capability of distinguishing between real and
synthetic speech in group conversations.We built Deep Neural
Network models and integrated them into a single solution
using different datasets, including but not limited to Urban-
Sound8K (5.6GB), Conversational (12.2GB), AMI-Corpus (5GB),
and FakeOrReal (4GB). Our proposed approach consists of
four main components. The speech-denoising component cleans
and preprocesses the audio using Multilayer-Perceptron and
Convolutional Neural Network architectures, with 93% and 94%
accuracies accordingly. The speaker diarization was implemented
using two different approaches, Natural Language Processing
for text conversion with 93% accuracy and Recurrent Neural
Network model for speaker labeling with 80% accuracy and
0.52 Diarization-Error-Rate. The final component distinguishes
between real and fake audio using a CNN architecture with
94% accuracy. With these findings, this research will contribute
immensely to the domain of speech analysis.
Description
Keywords
Deep Neural Networks, Natural Language Processing, Speaker Diarization, Deepfake, Deep Learning
