Document Type

Article

Publication Date

1-9-2024

Abstract

With the increasing prevalence of diabetes in Saudi Arabia, there is a critical need for early detection and prediction of the disease to prevent long-term health complications. This study addresses this need by using machine learning (ML) techniques applied to the Pima Indians dataset and private diabetes datasets through the implementation of a computerized system for predicting diabetes. In contrast to prior research, this study employs a semisupervised model combined with strong gradient boosting, effectively predicting diabetes-related features of the dataset. Additionally, the researchers employ the SMOTE technique to deal with the problem of imbalanced classes. Ten ML classification techniques, including logistic regression, random forest, KNN, decision tree, bagging, AdaBoost, XGBoost, voting, SVM, and Naive Bayes, are evaluated to determine the algorithm that produces the most accurate diabetes prediction. The proposed approach has achieved impressive performance. For the private dataset, the XGBoost algorithm with SMOTE achieved an accuracy of 97.4%, an F1 coefficient of 0.95, and an AUC of 0.87. For the combined datasets, it achieved an accuracy of 83.1%, an F1 coefficient of 0.76, and an AUC of 0.85. To understand how the model predicts the final results, an explainable AI technique using SHAP methods is implemented. Furthermore, the study demonstrates the adaptability of the proposed system by applying a domain adaptation method. To further enhance accessibility, a mobile app has been developed for instant diabetes prediction based on user-entered features. This study contributes novel insights and techniques to the field of ML-based diabetic prediction, potentially aiding in the early detection and management of diabetes in Saudi Arabia.

Share

COinS