Context aware stopwords for Sinhala Text classification

When working with Text Classification (TC), often the term "stopword" can be heard. Words in a document that are frequently occurring, but meaningless in terms of Information Retrieval (IR) are called Stopwords. There are various stopword lists available for many languages. According to the best of knowledge, no any generic stopword list has been built for the Sinhala language. This paper demonstrates how to generate a domain-specific stopword list from a given data set of Sinhala Newspapers. Hence, the seven stopword identification methods previously applied to other languages are presented to remove stopwords. Then, a new algorithm for building a domain-specific stopword list is proposed. For this method, it is assumed that average F-measure and average accuracy for the set of different stopword lists are measured by the performance of two classifiers. Based on the given comparative study, the most effective method to classify stopwords in Sinhala corpus can be identified.

Keywords

Text classification, Sinhala Text, Context aware, stopwords

Citation

S. V. S. Gunasekara and P. S. Haddela, "Context aware stopwords for Sinhala Text classification," 2018 National Information Technology Conference (NITC), 2018, pp. 1-6, doi: 10.1109/NITC.2018.8550073.

URI

https://rda.sliit.lk/handle/123456789/2003

Collections

Research Papers - Dept of Information Technology

Full item page

Publication:
Context aware stopwords for Sinhala Text classification

DOI

Files

Type:

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Research Projects

Organizational Units

Journal Issue

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By

Publication: Context aware stopwords for Sinhala Text classification

DOI

Files

Type:

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Research Projects

Organizational Units

Journal Issue

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By

Publication:
Context aware stopwords for Sinhala Text classification