Publication: 10-Year Cardiovascular Disease (CVD) Risk Prediction of Sri Lankans: A Longitudinal Cohort Study
DOI
Type:
Thesis
Date
2021
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Cardiovascular diseases are one of the leading causes of mortality in the world. A
cornerstone of preventive cardiology is identifying individuals at risk of cardiovascular
diseases (CVD) at the earliest. Clinical guideline primarily recommends risk prediction
models that are based on a limited number of predictors that perform poorly across all
patient groups. Predicting cardiovascular risk is crucial for making treatment decisions,
especially in the primary prevention of CVDs using a total risk approach. Despite the fact
that several cardiovascular risk prediction models exist, only a handful are specifically
designed for Asians, and none are generated from South Asians, including Sri Lankans.
Machine learning (ML) and neural networks appear to be increasingly promising in
supporting decision-making and forecasting from the huge amounts of data generated by
the healthcare industry. This led us to develop a CVD model using Machine Learning to
predict 10-year risk of developing a CVD in Sri Lankans. We investigated whether we
could adopt ML to develop a model and whether there is an improvement in including nontraditional variables for the accuracy of CVD risk estimates and how to validate the ML
model with existing WHO risk charts.
Using data on 2596 participants without CVD at baseline data collection of Ragama
Medical Officer of Health (MOH) area in Sri Lanka, we developed a ML-based model for
predicting CVD risk based on 75 available variables. However, the ratio of developing a
CVD vs no CVD in 10 years was 7:93, which is extremely unbalanced. Therefore, at first,
we derived a balanced dataset from the main dataset and build a ML model and it recorded
an 80.56% accuracy. Secondly, to alleviate the dataset's imbalance, we adopted two
techniques, which are 10-fold cross validation and stratified 10-fold cross validation (SKF)
and trained six ML classification algorithms. They are Random Forest (RF), Decision Tree,
AdaBoost, Gradient Boosting, K-Nearest Neighbor and 2D Neural Network. Out of these
six algorithms RF model with SKF showed the highest accuracy in predicting a CVD event
with an accuracy of 93.11%. Our ML model included predictors that are not usually
considered in existing risk prediction models. Systolic blood pressure was the most
important variable in this model. There were six non-traditional variables in the most ten
important variable list and three of them were non-laboratory variables. To validate the
model with existing WHO risk charts, we explored an experimental approach by
developing a simple logistic regression function using the same techniques as the best
selected model, with the seven traditional risk factors used in WHO risk charts and our Random Forest model indicated the highest accuracy compared to the WHO model, with a
difference of 26.20 %.
Our ML model improves the accuracy of CVD risk prediction in the Sri Lankan
population. This approach justifies that the CVD prediction models also can be derived
using ML for each subregion individually. Additionally, our research discovered novel
CVD disease factors that may now be investigated in prospective studies.
Description
Keywords
cardiovascular disease, risk assessment, models, machine learning, classification
