Publication:
Sentiment Classification of Sinhala Content in Social Media: A Comparison between Stemmers and N-gram Features

Research Projects

Organizational Units

Journal Issue

Abstract

Sentiment classification for non-English languages has gained significant attention from researchers in the past few years with the increasing use of non-English scripts and Romanized scripts for expressing sentiments over social media. In this study, we begin by classifying Sinhala sentiments on social media into positive and negative polarity classes using N-gram feature extraction. N-grams are a contiguous sequence of words or characters of a text. Then we focus on improving the classification accuracy by employing different stemming methods. Stemming is generally used to reduce the dimensionality of the feature set - something which needs to be carried out with great care as over reducing feature dimensionality causes the classification accuracy to decrease. Finally, we compare the accuracy and efficiency of N-gram feature extraction and stemming based sentiment analysis models.

Description

Keywords

Sentiment Classification, Sinhala Content, Social Media, Comparison between Stemmers, N-gram Features

Citation

P. Jayasuriya, R. Munasinghe and S. Thelijjagoda, "Sentiment Classification of Sinhala Content in Social Media: A Comparison between Stemmers and N-gram Features," 2021 IEEE 16th International Conference on Industrial and Information Systems (ICIIS), 2021, pp. 134-139, doi: 10.1109/ICIIS53135.2021.9660711.

Endorsement

Review

Supplemented By

Referenced By