Publication:
Dynamic stopword removal for Sinhala Language

dc.contributor.authorJayaweera, A. A. V. A
dc.contributor.authorSenanayake, Y. N
dc.contributor.authorHaddela, P. S
dc.date.accessioned2022-04-22T05:33:46Z
dc.date.available2022-04-22T05:33:46Z
dc.date.issued2019-10-08
dc.description.abstractIn the modern era of information retrieval, text summarization, text analytics, extraction of redundant (noise) words that contain a little information with low or no semantic meaning must be filtered out. Such words are known as stopwords. There are more than 40 languages which have identified their language specific stopwords. Most researchers use various techniques to identify their language specific stopword lists. But most of them try to define a magical cut-off point to the list, which they identify without any proof. In this research, the focus is to prove that the cut-off point depends on the source data and the machine learning algorithm, which will be proved by using Newton's iteration method of root finding algorithm. To achieve this, the research focuses on creating a stopword list for Sinhala language using the term frequency-based method by processing more than 90000 Sinhala documents. This paper presents the results received and new datasets prepared for text preprocessing.en_US
dc.identifier.citationA. A. V. A. Jayaweera, Y. N. Senanayake and P. S. Haddela, "Dynamic Stopword Removal for Sinhala Language," 2019 National Information Technology Conference (NITC), 2019, pp. 1-6, doi: 10.1109/NITC48475.2019.9114476.en_US
dc.identifier.doi10.1109/NITC48475.2019.9114476en_US
dc.identifier.issn2279-3895
dc.identifier.urihttps://rda.sliit.lk/handle/123456789/2011
dc.language.isoenen_US
dc.publisherIEEEen_US
dc.relation.ispartofseries2019 National Information Technology Conference (NITC);Pages 1-6
dc.subjectSinhala Languageen_US
dc.subjectDynamicen_US
dc.subjectStopword Removalen_US
dc.titleDynamic stopword removal for Sinhala Languageen_US
dc.typeArticleen_US
dspace.entity.typePublication

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Dynamic_Stopword_Removal_for_Sinhala_Language.pdf
Size:
341.13 KB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: