Faculty of Computing

Search Results

Now showing 1 - 3 of 3

Embargo
Speech Master: Natural Language Processing and Deep Learning Approach for Automated Speech Evaluation
(IEEE, 2021-12-06) Kooragama, K.G.C.M; Jayashanka, L. R. W. D; Munasinghe, J. A; Jayawardana, K. W; Tissera, M; Jayasingha, T. B
Every English speaker wishes to expertise his/her public speaking skills sharply. However, it is extremely difficult and requires a significant amount of practice and experience on an individual basis. This paper introduces a novel online tool “Speech Master” to practice and improve public English speech delivering skills in a professional manner. Using natural language processing, machine learning, and deep learning approaches, the proposed system analyzes the user's speech in terms of content, grammatical accuracy, grammatical richness, facial expressions, and flow. The accuracy was checked by comparing actual results taken from experts with the predicted results obtained from the tool. “Speech Master” achieves an average accuracy of more than 80% and produces a better overall result. This novel tool benefits English speakers all over the world by meeting the demand for a simple and easy-to-use solution for improving or practicing English speech delivery skills; enhancing oratory skills, boosting confidence, and delivering well-articulated speeches.
Embargo
Deepfake Audio Detection: A Deep Learning Based Solution for Group Conversations
(IEEE, 2020-12-10) Wijethunga, R. L. M. A. P. C; Matheesha, D. M. K; Noman, A. A; De Silva, K. H. V. T. A; Tissera, M; Rupasinghe, L
The recent advancements in deep learning and other related technologies have led to improvements in various areas such as computer vision, bio-informatics, and speech recognition etc. This research mainly focuses on a problem with synthetic speech and speaker diarization. The developments in audio have resulted in deep learning models capable of replicating natural-sounding voice also known as text-to-speech (TTS) systems. This technology could be manipulated for malicious purposes such as deepfakes, impersonation, or spoofing attacks. We propose a system that has the capability of distinguishing between real and synthetic speech in group conversations.We built Deep Neural Network models and integrated them into a single solution using different datasets, including but not limited to Urban-Sound8K (5.6GB), Conversational (12.2GB), AMI-Corpus (5GB), and FakeOrReal (4GB). Our proposed approach consists of four main components. The speech-denoising component cleans and preprocesses the audio using Multilayer- Perceptron and Convolutional Neural Network architectures, with 93% and 94% accuracies accordingly. The speaker diarization was implemented using two different approaches, Natural Language Processing for text conversion with 93% accuracy and Recurrent Neural Network model for speaker labeling with 80% accuracy and 0.52 Diarization-Error-Rate. The final component distinguishes between real and fake audio using a CNN architecture with 94 % accuracy. With these findings, this research will contribute immensely to the domain of speech analysis.
Embargo
Deepfake audio detection: a deep learning based solution for group conversations
(IEEE, 2020-12-10) Wijethunga, R. L. M. A. P. C; Matheesha, D. M. K; Noman, A. A; De Silva, K. H. V. T. A; Tissera, M; Rupasinghe, L
The recent advancements in deep learning and other related technologies have led to improvements in various areas such as computer vision, bio-informatics, and speech recognition etc. This research mainly focuses on a problem with synthetic speech and speaker diarization. The developments in audio have resulted in deep learning models capable of replicating natural-sounding voice also known as text-to-speech (TTS) systems. This technology could be manipulated for malicious purposes such as deepfakes, impersonation, or spoofing attacks. We propose a system that has the capability of distinguishing between real and synthetic speech in group conversations.We built Deep Neural Network models and integrated them into a single solution using different datasets, including but not limited to Urban-Sound8K (5.6GB), Conversational (12.2GB), AMI-Corpus (5GB), and FakeOrReal (4GB). Our proposed approach consists of four main components. The speech-denoising component cleans and preprocesses the audio using Multilayer- Perceptron and Convolutional Neural Network architectures, with 93% and 94% accuracies accordingly. The speaker diarization was implemented using two different approaches, Natural Language Processing for text conversion with 93% accuracy and Recurrent Neural Network model for speaker labeling with 80% accuracy and 0.52 Diarization-Error-Rate. The final component distinguishes between real and fake audio using a CNN architecture with 94 % accuracy. With these findings, this research will contribute immensely to the domain of speech analysis.

Faculty of Computing

Browse

Filters

Advanced Search

Filter by

Settings

Sort By

Results per page

Search Results