Evaluation of A Hybrid Approach Combining Random Forest and XGBoost for Type 2 Diabetes Prediction Using the PIMA Dataset
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
This study evaluates the performance of three machine learning models in predicting type 2 diabetes, focusing on their accuracy, sensitivity, and generalization capacity. The methodological process was structured into three phases: data preprocessing, model development, and performance evaluation. The widely validated PIMA dataset was used, which includes relevant clinical variables such as body mass index, blood glucose levels, and blood pressure. During preprocessing, the SMOTE algorithm was applied to address class imbalance, significantly improving the detection of positive cases. Three predictive approaches were compared: Random Forest, XGBoost, and a hybrid model based on soft voting between both. The results showed that Random Forest achieved the best performance with an accuracy of 97.0%, followed by XGBoost with 96.4% accuracy. The hybrid model, while matching XGBoost in nearly all metrics, demonstrated greater stability and robustness in its results. These metrics included an accuracy of 96.4%, a recall of 98.0%, F1-score of 96.4%, and AUC-ROC of 99.3% for XGBoost and 99.7% for the hybrid model, while Random Forest reached an F1-score of 97.1% and AUC-ROC of 99.86%. These results surpass those reported in previous research, reinforcing the clinical utility of these algorithms for early diagnostic tasks. As part of the project, an interactive interface was developed for healthcare professionals, allowing the input of medical data and immediate predictions on diabetes risk. This tool, delivered as functional software to the Tuscapujio Hospital Center in Sacaba, Cochabamba, Bolivia, represents a practical and accessible solution to support clinical decision-making in real-world contexts. The study highlights the importance of rigorous preprocessing and proper handling of class imbalance to improve the performance of predictive models in medical applications.