Research Publications Authored by SLIIT Staff

Permanent URI for this communityhttps://rda.sliit.lk/handle/123456789/4195

This collection includes all SLIIT staff publications presented at external conferences and published in external journals. The materials are organized by faculty to facilitate easy retrieval.

Browse

Search Results

Now showing 1 - 1 of 1
  • Thumbnail Image
    PublicationEmbargo
    Enhanced Tokenizer for Sinhala Language
    (IEEE, 2019-10-08) Senanayake, S. Y; Kariyawasam, K. T. P. M; Haddela, P. S
    Tokenization process plays a prominent role in natural language processing (NLP) applications. It chops the content into the smallest meaningful units. However, there is a limited number of tokenization approaches for Sinhala language. Standard analyzer in apache software library and natural language toolkit (NLTK) are the main existing approaches to tokenize Sinhala language content. Since these are language independent, there are some limitations when it applies to Sinhala. Our proposed Sinhala tokenizer is mainly focusing on punctuation-based tokenization. It precisely tokenizes the content by identifying the use case of punctuation mark. In our research, we have proved that our punctuation-based tokenization approach outperforms the word tokenization in existing approaches.