Performance Analysis of Text Classification Algorithms for Dhivehi Language Documents

Thumbnail Image

Date

2025

Journal Title

Journal ISSN

Volume Title

Publisher

Institute of Electrical and Electronics Engineers Inc.

Abstract

This study examines the effectiveness of various machine learning algorithms in classifying text written in 'Dhivehi,' the official language of the Maldives. As a low-resource language with limited research in text analytics, 'Dhivehi' poses unique challenges due to its distinctive linguistic properties. To address these challenges, this research evaluates the performance of algorithms, including Support Vector Machines, Naive Bayes, Decision Trees, Neural Networks, XGBoost, and Random Forest, leveraging a newly curated 'Dhivehi' language dataset. The evaluation highlights that K-Neighbors achieved the highest performance, with an accuracy of 64.7% and F1 scores (macro: 0.640, weighted: 0.642), demonstrating a strong balance between precision and recall. Support Vector Machines (accuracy: 63.9%) and XGBoost (accuracy: 62.8%) also showed competitive results, with SVM slightly outperforming XGBoost in F1 metrics. Decision Tree exhibited the lowest performance across all metrics. The findings provide critical insights into improving text classification for low-resource languages and contribute to developing natural language processing tools adapted explicitly for 'Dhivehi.' Furthermore, the dataset is publicly available on Mendeley data under the name 'Dhivehi Categories data set' to foster future research and innovation in this domain.

Description

Keywords

Asian Linguistics, Dhivehi Language, Low-Resource Languages, Machine Learning, Text Classification

Citation

Endorsement

Review

Supplemented By

Referenced By