Please use this identifier to cite or link to this item: https://rda.sliit.lk/handle/123456789/2599
Full metadata record
DC FieldValueLanguage
dc.contributor.authorTissera, M-
dc.contributor.authorWeerasinghe, R-
dc.date.accessioned2022-06-09T08:19:28Z-
dc.date.available2022-06-09T08:19:28Z-
dc.date.issued2019-02-25-
dc.identifier.citationM. Tissera and R. Weerasinghe, "Auto Generation of Gold Standard, Class Labeled Data Set and Ontology Extension Tool [QuadW]," 2019 Second International Conference on Advanced Computational and Communication Paradigms (ICACCP), 2019, pp. 1-6, doi: 10.1109/ICACCP.2019.8882996.en_US
dc.identifier.isbn978-1-5386-7989-0-
dc.identifier.urihttp://rda.sliit.lk/handle/123456789/2599-
dc.description.abstractAutomatic Knowledge Extraction (AKE) from domain independent, unstructured text sources is a challenging task in Natural Language Processing and Text analytics. Though, supervised learning mechanisms are very much result promising, application is painful due to the mandatory requirement of a class labeled training data set, as it involves expensive manual effort which is more time consuming. As a solution for this problem, this paper introduces a novel mechanism to build a self-learned classifier model that can automatically generate class labeled training data set for Knowledge/Information Extraction from domain independent unstructured text. Sri Lankan English newspapers (which comprise unstructured text in unconstrained domains) are the main data source for this study and a prototype was built to Professional Information Extraction with the semantic pattern Who holds/held What position, Where and When (Four words start with `W', hence named `QuadW'). Methodology uses advanced machine learning techniques such as, a Random Forest with Adaboost ensemble algorithm to build a composite classification model. This classifier is called as self-learned since, it generates its own training data set automatically. This composite model has improved accuracy and avoided over fitting to data as well. The rule-based feature extraction algorithm and the hand-craft ontology developed, can also be considered as novel components of this study. Self-learned classifier has been extensively improved and tested to show higher accuracy with precision and recall close to one. Therefore, the classified output from the self-learned classifier can be used as a gold-standard data set for future research in Professional Information Extraction. The constructed ontology with approximately 400 facts, also can be effectively used in future researches. Further, introduced classifier can be used as a tool to extend the existing ontology as well. A novel usage of machine learning algorithms to text classification demonstrates that, this study goes with the state-of-the-art technologies.en_US
dc.language.isoenen_US
dc.publisherIEEEen_US
dc.relation.ispartofseries2019 Second International Conference on Advanced Computational and Communication Paradigms (ICACCP);-
dc.subjectAuto Generationen_US
dc.subjectGold Standarden_US
dc.subjectOntology Extension Toolen_US
dc.subjectData Seten_US
dc.subjectClass Labeleden_US
dc.titleAuto Generation of Gold Standard, Class Labeled Data Set and Ontology Extension Tool [QuadW]en_US
dc.typeArticleen_US
dc.identifier.doi10.1109/ICACCP.2019.8882996en_US
Appears in Collections:Department of Information Technology-Scopes
Research Papers - IEEE
Research Publications -Dept of Information Technology

Files in This Item:
File Description SizeFormat 
Auto_Generation_of_Gold_Standard_Class_Labeled_Data_Set_and_Ontology_Extension_Tool_QuadW.pdf
  Until 2050-12-31
298.47 kBAdobe PDFView/Open Request a copy


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.