Publication: Unsupervised Sinhala Cyberbullying Categorization
DOI
Type:
Thesis
Date
2021
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The objective of unsupervised machine learning is to categorize the social media comments into
a given number of pre-learned categories. The earlier studies of this domain have used many the
dataset for supervised learning & introduced a large number of techniques, methodologies. A
major challenge there was training labels. Although words with training comments are easy to
find, separating them manually is not an easy task.
Through this research, we hope to find a solution to this using unsupervised machine learning
techniques. the proposed technique divides the comments into words and removed special
characters, emojis, and links from the comments & categorized each comment using a keyword
list of each category and similarity findings. And then this was used to categorize comments for
training. The implemented method shows the same performance, by Comparison with other
supervised machine learning techniques for cyberbullying.
Therefore, this mechanism can be used in any other places where low-cost cyberbullying
identification is needed. This also can be used to create train comments.
