Auto Generation of Gold Standard, Class Labeled Data Set and Ontology Extension Tool [QuadW]

Tissera, M; Weerasinghe, R

Publication:
Auto Generation of Gold Standard, Class Labeled Data Set and Ontology Extension Tool [QuadW]

dc.contributor.author	Tissera, M
dc.contributor.author	Weerasinghe, R
dc.date.accessioned	2022-06-09T08:19:28Z
dc.date.available	2022-06-09T08:19:28Z
dc.date.issued	2019-02-25
dc.description.abstract	Automatic Knowledge Extraction (AKE) from domain independent, unstructured text sources is a challenging task in Natural Language Processing and Text analytics. Though, supervised learning mechanisms are very much result promising, application is painful due to the mandatory requirement of a class labeled training data set, as it involves expensive manual effort which is more time consuming. As a solution for this problem, this paper introduces a novel mechanism to build a self-learned classifier model that can automatically generate class labeled training data set for Knowledge/Information Extraction from domain independent unstructured text. Sri Lankan English newspapers (which comprise unstructured text in unconstrained domains) are the main data source for this study and a prototype was built to Professional Information Extraction with the semantic pattern Who holds/held What position, Where and When (Four words start with `W', hence named `QuadW'). Methodology uses advanced machine learning techniques such as, a Random Forest with Adaboost ensemble algorithm to build a composite classification model. This classifier is called as self-learned since, it generates its own training data set automatically. This composite model has improved accuracy and avoided over fitting to data as well. The rule-based feature extraction algorithm and the hand-craft ontology developed, can also be considered as novel components of this study. Self-learned classifier has been extensively improved and tested to show higher accuracy with precision and recall close to one. Therefore, the classified output from the self-learned classifier can be used as a gold-standard data set for future research in Professional Information Extraction. The constructed ontology with approximately 400 facts, also can be effectively used in future researches. Further, introduced classifier can be used as a tool to extend the existing ontology as well. A novel usage of machine learning algorithms to text classification demonstrates that, this study goes with the state-of-the-art technologies.	en_US
dc.identifier.citation	M. Tissera and R. Weerasinghe, "Auto Generation of Gold Standard, Class Labeled Data Set and Ontology Extension Tool [QuadW]," 2019 Second International Conference on Advanced Computational and Communication Paradigms (ICACCP), 2019, pp. 1-6, doi: 10.1109/ICACCP.2019.8882996.	en_US
dc.identifier.doi	10.1109/ICACCP.2019.8882996	en_US
dc.identifier.isbn	978-1-5386-7989-0
dc.identifier.uri	https://rda.sliit.lk/handle/123456789/2599
dc.language.iso	en	en_US
dc.publisher	IEEE	en_US
dc.relation.ispartofseries	2019 Second International Conference on Advanced Computational and Communication Paradigms (ICACCP);
dc.subject	Auto Generation	en_US
dc.subject	Gold Standard	en_US
dc.subject	Ontology Extension Tool	en_US
dc.subject	Data Set	en_US
dc.subject	Class Labeled	en_US
dc.title	Auto Generation of Gold Standard, Class Labeled Data Set and Ontology Extension Tool [QuadW]	en_US
dc.type	Article	en_US
dspace.entity.type	Publication

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Auto_Generation_of_Gold_Standard_Class_Labeled_Data_Set_and_Ontology_Extension_Tool_QuadW.pdf
Size:: 298.47 KB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Research Papers - Dept of Information Technology

Publication: Auto Generation of Gold Standard, Class Labeled Data Set and Ontology Extension Tool [QuadW]

Files

Original bundle

License bundle

Collections

Publication:
Auto Generation of Gold Standard, Class Labeled Data Set and Ontology Extension Tool [QuadW]