10-Year Cardiovascular Disease (CVD) Risk Prediction of Sri Lankans: A Longitudinal Cohort Study

Solangaarachchige, M.B

Please use this identifier to cite or link to this item: https://rda.sliit.lk/handle/123456789/2858

Title:	10-Year Cardiovascular Disease (CVD) Risk Prediction of Sri Lankans: A Longitudinal Cohort Study
Authors:	Solangaarachchige, M.B
Keywords:	cardiovascular disease risk assessment models machine learning classification
Issue Date:	2021
Abstract:	Cardiovascular diseases are one of the leading causes of mortality in the world. A cornerstone of preventive cardiology is identifying individuals at risk of cardiovascular diseases (CVD) at the earliest. Clinical guideline primarily recommends risk prediction models that are based on a limited number of predictors that perform poorly across all patient groups. Predicting cardiovascular risk is crucial for making treatment decisions, especially in the primary prevention of CVDs using a total risk approach. Despite the fact that several cardiovascular risk prediction models exist, only a handful are specifically designed for Asians, and none are generated from South Asians, including Sri Lankans. Machine learning (ML) and neural networks appear to be increasingly promising in supporting decision-making and forecasting from the huge amounts of data generated by the healthcare industry. This led us to develop a CVD model using Machine Learning to predict 10-year risk of developing a CVD in Sri Lankans. We investigated whether we could adopt ML to develop a model and whether there is an improvement in including nontraditional variables for the accuracy of CVD risk estimates and how to validate the ML model with existing WHO risk charts. Using data on 2596 participants without CVD at baseline data collection of Ragama Medical Officer of Health (MOH) area in Sri Lanka, we developed a ML-based model for predicting CVD risk based on 75 available variables. However, the ratio of developing a CVD vs no CVD in 10 years was 7:93, which is extremely unbalanced. Therefore, at first, we derived a balanced dataset from the main dataset and build a ML model and it recorded an 80.56% accuracy. Secondly, to alleviate the dataset's imbalance, we adopted two techniques, which are 10-fold cross validation and stratified 10-fold cross validation (SKF) and trained six ML classification algorithms. They are Random Forest (RF), Decision Tree, AdaBoost, Gradient Boosting, K-Nearest Neighbor and 2D Neural Network. Out of these six algorithms RF model with SKF showed the highest accuracy in predicting a CVD event with an accuracy of 93.11%. Our ML model included predictors that are not usually considered in existing risk prediction models. Systolic blood pressure was the most important variable in this model. There were six non-traditional variables in the most ten important variable list and three of them were non-laboratory variables. To validate the model with existing WHO risk charts, we explored an experimental approach by developing a simple logistic regression function using the same techniques as the best selected model, with the seven traditional risk factors used in WHO risk charts and our Random Forest model indicated the highest accuracy compared to the WHO model, with a difference of 26.20 %. Our ML model improves the accuracy of CVD risk prediction in the Sri Lankan population. This approach justifies that the CVD prediction models also can be derived using ML for each subregion individually. Additionally, our research discovered novel CVD disease factors that may now be investigated in prospective studies.
URI:	http://rda.sliit.lk/handle/123456789/2858
Appears in Collections:	2021

Files in This Item:

File	Description	Size	Format
MS19805306-FINAL THESIS.pdf Until 2050-12-31		4.92 MB	Adobe PDF	View/Open Request a copy
MS19805306-FINAL THESIS_Abs.pdf		271.15 kB	Adobe PDF	View/Open

Show full item record