Publication:
Enhancing Healthcare Predictive Models Through Privacy- Preserving Synthetic Data Generation

dc.contributor.authorEdirisinghe M.M
dc.contributor.authorGunarathne J.H.M.S.M
dc.contributor.authorWanniarachchi W.A.A.M
dc.date.accessioned2026-05-11T09:30:37Z
dc.date.issued2025-09-09
dc.description.abstractThe advancement of healthcare predictive modeling is closely tied to the availability and quality of patient data. However, privacy regulations and ethical concerns often hinder data sharing, making it a persistent challenge. As a solution, privacy-preserving synthetic data generation has emerged, enabling the creation of artificial datasets that retain the statistical properties of real data while protecting individual privacy. This paper explores the use of such synthetic data throughout the clinical risk prediction pipeline by leveraging state-of-the-art generative models. We evaluate their utility in exploration data analysis, feature selection, model training, and deployment. Our study focuses on synthetic data generated using advanced models such as Differentially Private GANs (DPGAN), Private Aggregation of Teacher Ensembles GANs (PATEGAN), and Anonymization through Data Synthesis GANs (ADSGAN). Using these techniques, we created synthetic versions of the UK Biobank ever- smoker cohort. These synthetic datasets were shown to reproduce key statistical patterns, support effective feature selection, and enable accurate lung cancer risk prediction modeling all without using real patient data. We compare synthetic data with other privacy-enhancing technologies like federated learning and highlight a key advantage: synthetic data allows the direct use of existing analytical and machine learning tools without modification. Additionally, we examine deployment models such as "no- release" and "delayed-release," emphasizing how synthetic data can speed up research and enable broader data sharing while maintaining GDPR compliance. Overall, this study demonstrates the potential of synthetic data to transform healthcare research, software testing, education, and collaboration while carefully navigating the trade-off between privacy and utility.
dc.identifier.doihttps://doi.org/10.54389/HJHY2753
dc.identifier.issn2961-5011
dc.identifier.urihttps://rda.sliit.lk/handle/123456789/4968
dc.language.isoen
dc.publisherFaculty of Engineering
dc.relation.ispartofseriesSICET 2025; 71p.-77p.
dc.subjectSynthetic Data
dc.subjectPrivacy-Preserving Data Generation
dc.subjectHealthcare Predictive Modeling
dc.subjectMachine Learning
dc.subjectRisk Prediction
dc.subjectDifferential Privacy
dc.subjectGenerative Adversarial Networks
dc.titleEnhancing Healthcare Predictive Models Through Privacy- Preserving Synthetic Data Generation
dc.typeConference Paper
dspace.entity.typePublication

Files

Original bundle

Now showing 1 - 1 of 1
Thumbnail Image
Name:
10.Enhancing Healthcare Predictive Models Through Privacy- Preserving Synthetic Data Generation.pdf
Size:
493.51 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.69 KB
Format:
Item-specific license agreed upon to submission
Description: