Publication: A Hybrid Machine Learning and Rule-Based Approach for a Sinhala Natural Language Interface to Database
| dc.contributor.author | Mahdi, M. R. M. | |
| dc.date.accessioned | 2026-02-07T08:33:01Z | |
| dc.date.issued | 2025-12 | |
| dc.description.abstract | The objective of Natural Language Interfaces to Databases (NLIDBs), is to provide users with an intuitive way to get at their data; users can ask questions of their relational databases using natural language, instead of using a formal query language like SQL. While there have been significant advancements in developing NLIDs for high resource languages such as English, support for low resource and morphologically rich languages such as Sinhala continues to be limited. Most existing Sinhala NLIDBs have employed rule-based approaches, these have limitations in terms of adapting to new conditions and scaling. In this paper we propose a hybrid approach to developing Sinhala NLIDBs that combines rule-based logic with statistical methods to address the current limitations of Sinhala NLIDBs. Our focus will be on a single student table and supporting the basic SQL operations. We will employ a combination of core linguistic preprocessing techniques (tokenization, stemming, POS-tagging) along with a grammar driven query parser that is specifically designed to accommodate the unique structure of Sinhala. We will use a manually annotated dataset of 800 Sinhala-SQL query pairs to improve our model’s ability to identify semantic elements through Named Entity Recognition (NER). Furthermore, we will employ an intent classifier to guide the SQL generation process, enabling us to correctly understand a variety of natural language queries. Our hybrid architecture seeks to achieve a balance between the precision of rule-based systems and the flexibility of statistical systems; providing both interpretability and generalizability. In addition to improving the accessibility of databases for users who speak Sinhala, this research provides a foundation for developing future multilingual and multidomain NLIDBs for low resource languages. | |
| dc.identifier.uri | https://rda.sliit.lk/handle/123456789/4553 | |
| dc.language.iso | en | |
| dc.publisher | Sri Lanka Institute of Information Technology | |
| dc.subject | Hybrid Machine Learning | |
| dc.subject | Rule-Based Approach | |
| dc.subject | Sinhala Natural Language | |
| dc.subject | Natural Language Interface | |
| dc.title | A Hybrid Machine Learning and Rule-Based Approach for a Sinhala Natural Language Interface to Database | |
| dc.type | Thesis | |
| dspace.entity.type | Publication |
Files
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 1.69 KB
- Format:
- Item-specific license agreed upon to submission
- Description:
