Jayasuriya, PMunasinghe, RThelijjagoda, S2022-03-032022-03-032021-12-09P. Jayasuriya, R. Munasinghe and S. Thelijjagoda, "Sentiment Classification of Sinhala Content in Social Media: A Comparison between Stemmers and N-gram Features," 2021 IEEE 16th International Conference on Industrial and Information Systems (ICIIS), 2021, pp. 134-139, doi: 10.1109/ICIIS53135.2021.9660711.2164-7011https://rda.sliit.lk/handle/123456789/1453Sentiment classification for non-English languages has gained significant attention from researchers in the past few years with the increasing use of non-English scripts and Romanized scripts for expressing sentiments over social media. In this study, we begin by classifying Sinhala sentiments on social media into positive and negative polarity classes using N-gram feature extraction. N-grams are a contiguous sequence of words or characters of a text. Then we focus on improving the classification accuracy by employing different stemming methods. Stemming is generally used to reduce the dimensionality of the feature set - something which needs to be carried out with great care as over reducing feature dimensionality causes the classification accuracy to decrease. Finally, we compare the accuracy and efficiency of N-gram feature extraction and stemming based sentiment analysis models.enSentiment ClassificationSinhala ContentSocial MediaComparison between StemmersN-gram FeaturesSentiment Classification of Sinhala Content in Social Media: A Comparison between Stemmers and N-gram FeaturesArticle10.1109/ICIIS53135.2021.9660711