Department of Computer Systems Engineering
Permanent URI for this collectionhttp://rda.sliit.lk:8081/handle/123456789/2697
Browse
1 results
Search Results
Publication Embargo Algorithmically Navigating Complex Tabular Structures in Images for Information Extraction(IEEE, 2022-12-26) Nugawela, M; Abeywardena, K. Y; Mahaadikara, HComputer vision has been in the forefront of automating workflows to replace manual repetitive tasks with convenience and accuracy. Recognizing text from images of commercial documents through optical character recognition (OCR) form the initial step of most such workflows where majority of their information are in the form of complex data structures such as tables and nested tables. Although OCR technology has evolved to effectively capture text from images, there is still room for improvement in recognizing complex data structures and extracting tabular data from images. This paper proposes an algorithmic approach based on keyword detection and the position of words relative to each other in order to recognize nested structures and successfully extract tabular data into a program and human readable format, which aims to take a different approach as opposed to using machine learning models or pre-defined templates for layout recognition. Furthermore, this approach is shown to yield successful results in correctly comprehending the layout and data of nested table structures in multiple rows in a table.
