Browsing by Author "Senarathna, M"

Now showing 1 - 3 of 3

Embargo
Adapting MaryTTS for Synthesizing Sinhalese Speech to Communicate with Children
(IEEE, 2021-12-01) Lakmal, M. A. J. A; Methmini, K. A. D. G; Rupasinghe, D. M. H. M; Hettiarachchi, D. I; Piyawardana, V; Senarathna, M; Reyal, S; Pulasinghe, K
The majority of the Sri Lankan population speak Sinhala, which is also the country's mother tongue. Sinhala is a difficult language to learn by children aged between 1–6 years when compared to other languages. Text to speech system is popular among children who have difficulties with reading, especially those who struggle with decoding. By presenting the words auditorily, the child can focus on the meaning of words instead of spending all their brainpower trying to sound out the words. In Sri Lanka, however, computer systems based on the Sinhala language especially for children are extremely rare. In this situation having a Sinhala text-to-speech technology for communicating with children is a helpful option. Intelligibility should be considered deeply in this system because this is specific for children. Recordings of a native Sinhalese speaker were used to synthesize a natural-sounding voice, rather than a robotic voice. This paper proposes an approach of implementing a Sinhalese text-to-speech system for communicating with children using unit selection and HMM -based mechanisms in the MaryTTS framework. Although a work in progress, the intermediate findings have been presented.
Embargo
An Image Based Approach of Energy Signal Disaggregation Using Artificial Intelligence
(IEEE, 2021-12-09) Senarathna, M; Herath, M; Thilakanayake, H. D; Liyanage, M. H; Angammana, C. J
Non-Intrusive Load Monitoring (NILM) is the real-time monitoring of energy consumption data of individual appliances through the decomposition of composite energy signal captured at the household smart energy meter. Most of the existing NILM techniques utilize one-dimensional (1D) time-series signal analysis to predict the individual appliance energy signals. The utilization of image-based methods for the disaggregation of energy signals is a relatively new approach in the NILM domain. This paper presents a study of a novel computer vision-based Artificial Intelligence (AI) approach when compared to the traditional time series-based NILM methods. Gramian Angular Fields (GAF) and Recurrence Plots (RP) have been widely used in recent literature to encode time series signals as images. Novel image classification techniques with the use of Convolutional Neural Networks (CNN) simplify the extraction of nuclear load features from encoded two-dimensional (2D) images. The results considered the indices validation accuracy and validation loss in comparing the performance of different vision-based AI approaches. The results reveal that Gramian Angular Difference Field (GADF) outperforms both Gramian Angular Summation Field (GASF) and RP with a training accuracy of 97.9% and a validation accuracy of 94.2%. A comprehensive analysis and comparison are presented with an in-depth evaluation using multi-state appliances and it was concluded that GADF is the most suitable 1D to 2D conversion method for the representation of time series energy data for disaggregation purposes.
Embargo
Step-by-Step Process of Building Voices for Under Resourced Languages using MARY TTS Platform
(IEEE, 2022-12-09) Senarathna, M; Pulasinghe, K; Reyal, S
This paper presents a comprehensive guide for creating synthetic voices to support under resourced languages for the MaryTTS platform. Although researchers have extensively contributed in the domain of speech synthesis, the lack of a thorough documentation hinders the voice building process for languages not yet supported by MaryTTS, complicating the implementation process for users with inadequate knowledge in the field of Text-to-Speech (TTS). The step-by-step process discussed in this study is further demonstrated with the creation of a synthetic voice for the Sinhala language, with unit selection as the voice building approach. A Sinhalese voice was generated with an intelligibility score of 91.7% upon evaluation with Diagnostic Rhyme Test (DRT). Comparison with ground truth data proved a close approximation to human speech where the intelligibility score was identified as 97.9%, when tested with the same participants. The Mean Opinion Score (MOS) revealed a naturalness level of 2.993, indicating a moderately high speech quality for the proposed system in comparison with the ideal score of 4.972.