Publication: Enhancing Fault-Tolerant ETL Pipelines Through AI-Driven Predictive Maintenance: A Corporate Framework for Improved Data Quality and Integration
DOI
Type:
Thesis
Date
2024-12
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
SLIIT
Abstract
Maintaining high data quality and consistency across various sources is essential for making
informed and effective decisions in today’s data-centric environment. This research presents an
AI-driven approach to enhance fault tolerance within ETL (Extract, Transform, Load) pipelines,
aiming to improve data quality through predictive maintenance mechanisms. The proposed ETL
framework automates data cleaning, standardization, and error handling, utilizing machine
learning and natural language processing (NLP) techniques to identify and resolve data
inconsistencies in real time.
By integrating AI models into each phase of the ETL process, the pipeline demonstrates resilience
against common data irregularities across varied formats, such as dates, numbers, and text. A
unique feature of this approach is its predictive maintenance capability, where machine learning
algorithms proactively address potential faults before they escalate, reducing downtime and
increasing overall system reliability. Key components include LSTM-based models for date and
text standardization, anomaly detection mechanisms for fault tolerance, and an automated error
logging system to streamline data auditing processes. Results from experimental evaluations show
that the AI-driven pipeline achieves significant improvements in data consistency and error
detection, with up to a 98% reduction in inconsistencies for critical data fields. Despite some
limitations, including resource intensity and sensitivity to rare data patterns, this research
highlights the potential of AI-augmented ETL systems to meet the growing demand for robust data
integration solutions in corporate environments. The findings suggest that AI-driven fault-tolerant
ETL pipelines can play a pivotal role in advancing data quality management, enabling
organizations to make data-driven decisions with greater confidence
Description
Keywords
Enhancing Fault-Tolerant, ETL Pipelines, AI-Driven Predictive Maintenance, Corporate Framework, Improved Data Quality, Integration
