Research Publications Authored by SLIIT Staff

Permanent URI for this communityhttps://rda.sliit.lk/handle/123456789/4195

This collection includes all SLIIT staff publications presented at external conferences and published in external journals. The materials are organized by faculty to facilitate easy retrieval.

Browse

Search Results

Now showing 1 - 2 of 2
  • Thumbnail Image
    PublicationEmbargo
    Deepfake Audio Detection: A Deep Learning Based Solution for Group Conversations
    (IEEE, 2020-12-10) Wijethunga, R. L. M. A. P. C; Matheesha, D. M. K; Noman, A. A; De Silva, K. H. V. T. A; Tissera, M; Rupasinghe, L
    The recent advancements in deep learning and other related technologies have led to improvements in various areas such as computer vision, bio-informatics, and speech recognition etc. This research mainly focuses on a problem with synthetic speech and speaker diarization. The developments in audio have resulted in deep learning models capable of replicating natural-sounding voice also known as text-to-speech (TTS) systems. This technology could be manipulated for malicious purposes such as deepfakes, impersonation, or spoofing attacks. We propose a system that has the capability of distinguishing between real and synthetic speech in group conversations.We built Deep Neural Network models and integrated them into a single solution using different datasets, including but not limited to Urban-Sound8K (5.6GB), Conversational (12.2GB), AMI-Corpus (5GB), and FakeOrReal (4GB). Our proposed approach consists of four main components. The speech-denoising component cleans and preprocesses the audio using Multilayer- Perceptron and Convolutional Neural Network architectures, with 93% and 94% accuracies accordingly. The speaker diarization was implemented using two different approaches, Natural Language Processing for text conversion with 93% accuracy and Recurrent Neural Network model for speaker labeling with 80% accuracy and 0.52 Diarization-Error-Rate. The final component distinguishes between real and fake audio using a CNN architecture with 94 % accuracy. With these findings, this research will contribute immensely to the domain of speech analysis.
  • Thumbnail Image
    PublicationEmbargo
    Deepfake audio detection: a deep learning based solution for group conversations
    (IEEE, 2020-12-10) Wijethunga, R. L. M. A. P. C; Matheesha, D. M. K; Noman, A. A; De Silva, K. H. V. T. A; Tissera, M; Rupasinghe, L
    The recent advancements in deep learning and other related technologies have led to improvements in various areas such as computer vision, bio-informatics, and speech recognition etc. This research mainly focuses on a problem with synthetic speech and speaker diarization. The developments in audio have resulted in deep learning models capable of replicating natural-sounding voice also known as text-to-speech (TTS) systems. This technology could be manipulated for malicious purposes such as deepfakes, impersonation, or spoofing attacks. We propose a system that has the capability of distinguishing between real and synthetic speech in group conversations.We built Deep Neural Network models and integrated them into a single solution using different datasets, including but not limited to Urban-Sound8K (5.6GB), Conversational (12.2GB), AMI-Corpus (5GB), and FakeOrReal (4GB). Our proposed approach consists of four main components. The speech-denoising component cleans and preprocesses the audio using Multilayer- Perceptron and Convolutional Neural Network architectures, with 93% and 94% accuracies accordingly. The speaker diarization was implemented using two different approaches, Natural Language Processing for text conversion with 93% accuracy and Recurrent Neural Network model for speaker labeling with 80% accuracy and 0.52 Diarization-Error-Rate. The final component distinguishes between real and fake audio using a CNN architecture with 94 % accuracy. With these findings, this research will contribute immensely to the domain of speech analysis.