Building a Diabetes Prediction System Based on Machine Learning Algorithms
Abstract
This paper explores the possibility of using machine learning algorithms to predict type 2 diabetes. We selected two commonly used classification models: random forest and logistic regression, modeled patients’ clinical and lifestyle data, and compared their prediction performance. We found that the random forest model achieved the highest accuracy, demonstrated excellent classification results on the test set, and better distinguished between diabetic and non-diabetic patients by the confusion matrix and other evaluation metrics. The support vector machine and logistic regression perform slightly less well but achieve a high level of accuracy. The experimental results validate the effectiveness of the three machine learning algorithms, especially random forest, in the diabetes prediction task and provide useful practical experience for the intelligent prevention and control of chronic diseases. This study promotes the innovation of the diabetes prediction and management model, which is expected to alleviate the pressure on medical resources, reduce the burden of social health care, and improve the prognosis and quality of life of patients. In the future, we can consider expanding the data scale, exploring other machine learning algorithms, and integrating multimodal data to further realize the potential of artificial intelligence (AI) in the field of diabetes.
References
World Health Organization, 2013, Global Action Plan for the Prevention and Control of Noncommunicable Diseases 2013–2020. Geneva: WHO.
Erdogan A, Duzgun AP, Erdogan K, et al., 2018, Efficacy of Hyperbaric Oxygen Therapy in Diabetic Foot Ulcers Based on Wagner Classification. The Journal of Foot and Ankle Surgery, 2018, 57(6): 1115–1119.
Wang L, Peng W, Zhao Z, et al., 2021, Prevalence and Treatment of Diabetes in China, 2013–2018. JAMA, 326(24): 2498–2506
Tuomilehto J, Lindstrm J, Eriksson JG, et al., 2001, Prevention of Type 2 Diabetes Mellitus by Changes in Lifestyle Among Subjects with Impaired Glucose Tolerance. New England Journal of Medicine, 344(18): 1343–1350. https://doi.org/10.1056/NEJM200105033441801
Hayes C, Kriska A, 2008, Role of Physical Activity in Diabetes Management and Prevention. Journal of the American Dietetic Association, 108(4): S19–S23.
Salih MS, Khalil R, Zeebaree SRM, 2024, Diabetic Prediction Based on Machine Learning Using PIMA Indian Dataset. Communications on Applied Nonlinear Analysis, 31(5s): 138–156. https://doi.org/10.52783/cana.v31.1008
Naz H, Ahuja S, 2020, Deep Learning Approach for Diabetes Prediction Using PIMA Indian Dataset. J Diabetes Metab Disord, 19(1): 391–403. https://doi.org/10.1007/s40200-020-00520-5
Glasgow RE, 1995, A Practical Model of Diabetes Management and Education[J]. Diabetes Care, 18(1): 117–126.
Garber AJ, Abrahamson MJ, Barzilay JI, et al., 2013, AACE Comprehensive Diabetes Management Algorithm 2013. Endocrine Practice: Official Journal of the American College of Endocrinology and the American Association of Clinical Endocrinologists, 19(2): 327–336.
Watkins PJ, Amiel SA, Howell SL, et al., 2003, Diabetes and Its Management. John Wiley & Sons, United Kingdom.
 
							