Premature mortality from cardio-cerebrovascular diseases in Bogotá an analytical machine learning approach
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Nature Portfolio
Abstract
Premature mortality from cardio-cerebrovascular diseases represents an increasing burden on health systems, particularly in urban contexts across Latin America. This study analyzes mortality records in Bogotá from 2010 to 2022 via descriptive analysis, time series, and machine learning models. It includes deaths among individuals aged over 30, classified as premature or nonpremature based on a 75-year threshold1. Supervised models were trained using sociodemographic, insurance-related, and underlying cause-of-death variables, and their performance was evaluated via standard metrics. The random forest model showed the best overall performance, with educational level, insurance scheme, and place of death emerging as the main predictors. Additionally, separate models were developed for diagnostic groups (ischemic, cerebrovascular, hypertensive, and heart failure) and revealed differences in classification patterns. The model for ischemic heart disease achieved the highest AUC (0.69), followed by cerebrovascular (0.65), hypertensive (0.63), and heart failure (0.61). SHAP analysis highlighted the differential contribution of sociodemographic variables such as place of death, sex, educational level, and insurance scheme, with distinct patterns observed across causes of death. Trend analysis revealed a sustained increase in premature mortality, which increased during the pandemic period. These findings underscore the role of social determinants in premature cardiovascular deaths and highlight the potential of machine learning as a decision-support tool for public health.