Please use this identifier to cite or link to this item: https://rda.sliit.lk/handle/123456789/4063
Title: Analyzing the Performance of Different Text Classification Algorithms for “Dhivehi” Documents
Authors: Mohamed, F.R
Keywords: Analyzing
Performance
Algorithms
Different Text Classification
Low-resource languages
Asian languages
Dhivehi Text Classification
Issue Date: Dec-2024
Publisher: SLIIT
Abstract: This research investigates the effectiveness of various machine learning classification algorithms applied to Dhivehi text-based documents. Dhivehi, the official language of the Maldives, presents unique linguistic challenges for text classification due to its limited digital resources and distinct grammatical structure. The study aims to identify the most suitable algorithm for classifying Dhivehi documents and to provide insights into optimizing text classification approaches for less- resourced languages. The research systematically evaluates the performance of several machine learning algorithms, including Support Vector Machines (SVM), Naive Bayes, Decision Trees, XGboost , Random Forest and Neural Networks. These algorithms are applied to a diverse dataset of Dhivehi text, encompassing various genres and topics. The study employs a rigorous methodology involving data preprocessing, feature extraction, and model training and testing. Performance metrics such as accuracy, precision, recall, and F1-score are used to compare the efficacy of each algorithm. Additionally, the research explores the impact of different text representation techniques, including bag-of-words, TF-IDF, and word embeddings, on classification performance. The findings offer valuable insights into optimizing text classification methods for low-resource languages and aim to advance natural language processing tools specifically tailored for “Dhivehi.” The evaluation highlights that K-Neighbors achieved the highest performance, with an accuracy of 64.7% and F1 scores (macro: 0.640, weighted: 0.642), demonstrating a strong balance between precision and recall. Support Vector Machines (accuracy: 63.9%) and XGBoost (accuracy: 62.8%) also showed competitive results, with SVM slightly outperforming XGBoost in F1 metrics. Decision Tree exhibited the lowest performance across all metrics. By identifying the most effective classification algorithms and representation techniques, this research aims to enhance the accuracy and efficiency of Dhivehi text classification tasks. The results will have practical applications in areas such as sentiment analysis, document categorization, and information retrieval systems tailored for the Dhivehi language. Furthermore, the dataset is publicly available on Mendeley data under the name “Dhivehi Categories data set” to foster future research and innovation in this domain.
URI: https://rda.sliit.lk/handle/123456789/4063
Appears in Collections:2024



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.