Conditional Random Fields based named entity recognition for sinhala

Senevirathne, K. U; Attanayake, N. S; Dhananjanie, A. W. M. H; Weragoda, W. A. S. U; Nugaliyadde, A; Thelijjagoda, S

Publication:
Conditional Random Fields based named entity recognition for sinhala

dc.contributor.author	Senevirathne, K. U
dc.contributor.author	Attanayake, N. S
dc.contributor.author	Dhananjanie, A. W. M. H
dc.contributor.author	Weragoda, W. A. S. U
dc.contributor.author	Nugaliyadde, A
dc.contributor.author	Thelijjagoda, S
dc.date.accessioned	2022-02-24T10:21:16Z
dc.date.available	2022-02-24T10:21:16Z
dc.date.issued	2015-12-18
dc.description.abstract	Named Entity Recognition (NER) plays an important role in Natural Language Processing (NLP). Named Entities (NEs) are special atomic elements in natural languages belonging to predefined categories such as persons, organizations, locations, expressions of times, quantities, monetary values and percentages etc. These are referring to specific things and not listed in grammar or lexicons. NER is the task of identifying such NEs. This is a task entwined with number of challenges. Entities may be difficult to find at first, and once found, difficult to classify. For instance, locations and person names can be the same, and follow similar formatting. This becomes tough when it comes to South and South East Asian languages. That is mainly due to the nature of these languages. Even though Latin languages have accurate NER solutions those cannot be directly applied for Indic languages, because the features found in those languages are different from English. Therefore the research was based on producing a mathematical model which acts as the integral part of the Sinhala NER system. The researchers used Sinhala News corpus as the data set to train the Conditional Random Fields (CRFs) algorithm. 90% of the corpus was used in training the model, 10% is used in testing the resulted model. The research makes use of orthographic word-level features along with contextual information, which are helpful in predicting three different NE classes namely Persons, Locations and Organizations. The findings of the research were applied in developing the NE Annotator which identified NE classes from unstructured Sinhala text. The prominent contribution of this research for NER could benefit Sinhala NLP application developers and NLP related researchers in near future.	en_US
dc.identifier.citation	K. U. Senevirathne, N. S. Attanayake, A. W. M. H. Dhananjanie, W. A. S. U. Weragoda, A. Nugaliyadde and S. Thelijjagoda, "Conditional Random Fields based Named Entity Recognition for Sinhala," 2015 IEEE 10th International Conference on Industrial and Information Systems (ICIIS), 2015, pp. 302-307, doi: 10.1109/ICIINFS.2015.7399028.	en_US
dc.identifier.doi	10.1109/ICIINFS.2015.7399028	en_US
dc.identifier.isbn	978-1-4799-1876-8
dc.identifier.uri	https://rda.sliit.lk/handle/123456789/1381
dc.language.iso	en	en_US
dc.publisher	IEEE	en_US
dc.relation.ispartofseries	2015 IEEE 10th International Conference on Industrial and Information Systems (ICIIS);Pages 302-307
dc.subject	Conditional	en_US
dc.subject	Random Fields	en_US
dc.subject	Entity Recognition	en_US
dc.subject	Sinhala	en_US
dc.subject	Fields based	en_US
dc.subject	Named Entity	en_US
dc.title	Conditional Random Fields based named entity recognition for sinhala	en_US
dc.type	Article	en_US
dspace.entity.type	Publication

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Conditional_Random_Fields_based_Named_Entity_Recognition_for_Sinhala.pdf
Size:: 885.03 KB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Research Papers - Dept of Information of Management

Publication: Conditional Random Fields based named entity recognition for sinhala

Files

Original bundle

License bundle

Collections

Publication:
Conditional Random Fields based named entity recognition for sinhala