Mining hypertension predictors using decision tree: Baseline data of Kharameh cohort study
Abstract
Abstract
Background: Hypertension is a serious chronic disease and an important risk factor for many health problems. this study aimed to investigate the factors associated with hypertension using a decision-tree algorithm.
Methods: this cross-sectional study was conducted in Kharameh city between 2014-2017 through census. The study included 2510 hypertensive and 7840 non-hypertensive individuals. 70% of the cases were randomly allocated to the training dataset for establishing the decision tree, while the remaining 30% were used as the testing dataset for performance evaluation of the decision-tree. Two models were assessed. In the first model (model I), 15 variables including age, gender, body mass index, years of education, Occupation status, marital status, family history of hypertension, physical activity, total energy, number of meals, salt, oil type, drug use, alcohol use and smoke entered in to the model. in the second model (model II) 16 variables including age, gender, BMI and Blood factors as HCT, MCHC, PLT, FBS, BUN, CERAT, TG, CHOL, ALP, HDL, GGT, LDL and SG were considered. a receiver operating characteristic (ROC) curve was applied to assess the validation of the models.
Results: The accuracy, sensitivity, specificity, and area under the ROC curve (AUC) are important metrics to evaluate the performance of a decision tree model. For model I, the accuracy, sensitivity, specificity and area under the ROC curve (AUC) value were 79.24%, 82.41%, 78.24% and 0.80, respectively. for model II, the corresponding values were 79.50%, 81.03%, 79.02% and 0.80, respectively.
Conclusion: We have suggested a decision tree model to identify the risk factors associated with hypertension. This model can be useful for early screening and improving preventive and curative health services in health promotion.
1. Han Z, Wen LJAoTM. Development and validation of a decision tree classification model for the essential hypertension based on serum protein biomarkers. J Annals of Translational Medicine. 2022;10(18).
2. Staessen JA, Wang J, Bianchi G, Birkenhäger WHJTL. Essential hypertension. J The Lancet. 2003;361(9369):1629-41.
3. Liu L-S, Wu Z, Wang J, Wang W, Bao Y, Cai J, et al. 2018 Chinese guidelines for prevention and treatment of hypertension-A report of the revision committee of Chinese guidelines for prevention and treatment of hypertension. J Journal of Geriatric Cardiology. 2019;16(3):182-245.
4. Pickering TG, Hall JE, Appel LJ, Falkner BE, Graves J, Hill MN, et al. Recommendations for blood pressure measurement in humans and experimental animals: part 1: blood pressure measurement in humans: a statement for professionals from the Subcommittee of Professional and Public Education of the American Heart Association Council on High Blood Pressure Research. J Circulation. 2005;111(5):697-716.
5. Colin Bell A, Adair LS, Popkin BMJAjoe. Ethnic differences in the association between body mass index and hypertension. J American journal of 2002;155(4):346-53.
6. Pescatello L, Franklin B, Fagard R, Farquhar W, Kelley G, Ray CJMSSE. Exercise and hypertension: American College of Sports Medicine position stand. J Med Sci Sports Exerc 2004;36(3):533-53.
7. Cornelissen VA, Fagard RHJH. Effects of endurance training on blood pressure, blood pressure–regulating mechanisms, and cardiovascular risk factors. J Hypertension 2005;46(4):667-75.
8. Akdag B, Fenkci S, Degirmencioglu S, Rota S, Sermez Y, Camdeviren HJAit. Determination of risk factors for hypertension through the classification tree method. Advances in therapy 2006;23:885-92.
9. Beaty TH, Neel JV, Fajans SSJAjoe. Identifying risk factors for diabetes in first degree relatives of non-insulin dependent diabetic patients. J American journal of epidemiology 1982;115(3):380-97.
10. Pan X-R, Yang W-Y, Li G-W, Liu J, Prevention ND, care CCGJD. Prevalence of diabetes and its risk factors in China, 1994. J Diabetes care. 1997;20(11):1664-9.
11. Goss EP, Ramchandani HJJoE, Finance. Comparing classification accuracy of neural networks, binary logit regression and discriminant analysis for insolvency prediction of life insurers. Journal of Economics Finance. 1995;19(3):1-18. 94 Vol 10 No 1 (2024) Mining Hypertension Predictors using Decision Tree ... Rezaianzadeh A et al.
12. Efron BJJotASA. The efficiency of logistic regression compared to normal discriminant analysis. J Journal of the American Statistical Association. 1975;70(352):892-8.
13. Fan X, Wang LJTJoee. Comparing linear discriminant function with logistic regression for the two-group classification problem. J The Journal of experimental education 1999;67(3):265-86.
14. Somvanshi M, Chavan P, Tambade S, Shinde S, editors. A review of machine learning techniques using decision tree and support vector machine. 2016 international conference on computing communication control and automation (ICCUBEA); 2016: IEEE.
15. Maroco J, Silva D, Rodrigues A, Guerreiro M, Santana I, de Mendonça AJBrn. Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests. J BMC research. 2011;4(1):1-14.
16. Kurt I, Ture M, Kurum ATJEswa. Comparing performances of logistic regression, classification and regression tree, and neural networks for predicting coronary artery disease. J Expert systems with applications. 2008;34(1):366-74.
17. Chang C-D, Wang C-C, Jiang BCJEswa. Using data mining techniques for multi-diseases prediction modeling of hypertension and hyperlipidemia by common risk factors. J Expert systems with applications 2011;38(5):5507-13.
18. Ture M, Kurt I, Kurum AT, Ozdamar KJESwA. Comparing classification techniques for predicting essential hypertension. J Expert Systems with Applications. 2005;29(3):583-8.
19. Keshani P, Jalali M, Johari MG, Rezaeianzadeh R, Hosseini SV, Rezaianzadeh AJJoB, et al. The Association between Dietary Antioxidant Indices and Cardiac Disease: Baseline Data of Kharameh Cohort Study. ournal of Biostatistics Epidemiology 2022;8(4):458-70.
20. Poustchi H, Eghtesad S, Kamangar F, Etemadi A, Keshtkar A-A, Hekmatdoost A, et al. Prospective epidemiological research studies in Iran (the PERSIAN Cohort Study): rationale, objectives, and design. American journal of epidemiology. 2018;187(4):647-55.
21. Rezazadeh A, Rashidkhani BJJons, vitaminology. The association of general and central obesity with major dietary patterns of adult women living in Tehran, Iran. Journal of nutritional science vitaminology 2010;56(2):132-8.
22. Jalali M, Keshani P, Ghoddusi Johari M, Rezaeianzadeh R, Hosseini SV, Rezaeianzadeh AJBRI. The Association between Index of Nutritional Quality (INQ) and Obesity: Baseline Data of Kharameh Cohort. BioMed Research International. 2022;2022.
23. Mirmiran P, Esfahani FH, Mehrabi Y, Hedayati M, Azizi FJPhn. Reliability and relative validity of an FFQ for nutrients in the Tehran lipid and glucose study. Public health 95 Vol 10 No 1 (2024) Mining Hypertension Predictors using Decision Tree ... Rezaianzadeh A et al. nutrition. 2010;13(5):654-62.
24. Esmaily H, Tayefi M, Doosti H, Ghayour-Mobarhan M, Nezami H, Amirabadizadeh AJJorihs. A comparison between decision tree and random forest in determining the risk factors associated with type 2 diabetes. Journal of research in health sciences 2018;18(2):412.
25. Jadhav SD, Channe HJIRJET. Efficient recommendation system using decision tree classifier and collaborative filtering. Int Res J Eng Technol 2016;3(8):2113-8.
26. Ghiasi MM, Zendehboudi S, Mohsenipour AAJCm, biomedicine pi. Decision tree-based diagnosis of coronary artery disease: CART model. 2020;192:105400.
27. Khalilia M, Chakraborty S, Popescu MJBmi, making d. Predicting disease risks from highly imbalanced data using random forest. BMC medical informatics decision making 2011;11:1-13.
28. Maria Navin J, Pankaja RJIJoE, Research T. Performance analysis of text classification algorithms using confusion matrix. 2016;6(4):75-8.
29. Nhu V-H, Shirzadi A, Shahabi H, Singh SK, Al-Ansari N, Clague JJ, et al. Shallow landslide susceptibility mapping: A comparison between logistic model tree, logistic regression, naïve bayes tree, artificial neural network, and support vector machine algorithms. International journal of environmental research public health 2020;17(8):2749.
30. Lee J-SJIA. AUC4. 5: AUC-based C4. 5 decision tree algorithm for imbalanced data classification. IEEE Access. 2019;7:106034- 42.
31. Tayefi M, Esmaeili H, Karimian MS, Zadeh AA, Ebrahimi M, Safarian M, et al. The application of a decision tree to establish the parameters associated with hypertension. Computer methods programs in biomedicine 2017;139:83-91.
32. Rivas AM, Pena C, Kopel J, Dennis JA, Nugent K. Hypertension and hyperthyroidism: association and pathogenesis. The American Journal of the Medical Sciences. 2021;361(1):3- 7.
33. Ture M, Kurt I, Kurum AT, Ozdamar K. Comparing classification techniques for predicting essential hypertension. Expert Systems with Applications 2005;29(3):583-8.
34. Chae YM, Ho SH, Cho KW, Lee DH, Ji SH. Data mining approach to policy analysis in a health insurance domain. International journal of medical informatics. 2001;62(2-3):103-11.
35. Delen D, Walker G, Kadam AJAiim. Predicting breast cancer survivability: a comparison of three data mining methods. J Artificial intelligence in medicine. 2005;34(2):113-27.
36. Stärk KD, Pfeiffer DUJIDA. The application of non-parametric techniques to solve classification problems in complex data sets in veterinary epidemiology–An example. Intelligent Data Analysis. 1999;3(1):23-35. 96 Vol 10 No 1 (2024) Mining Hypertension Predictors using Decision Tree ... Rezaianzadeh A et al.
37. Colombet I, Ruelland A, Chatellier G, Gueyffier F, Degoulet P, Jaulent M-C, editors. Models to predict cardiovascular risk: comparison of CART, multilayer perceptron and logistic regression. Proceedings of the AMIA Symposium; 2000: American Medical Informatics Association.
38. Kammerer JS, McNabb SJ, Becerra JE, Rosenblum L, Shang N, Iademarco MF, et al. Tuberculosis transmission in nontraditional settings: a decision-tree approach. American journal of preventive medicine. 2005;28(2):201- 7.
39. Wang C-J, Li Y-Q, Wang L, Li L-L, Guo Y-R, Zhang L-Y, et al. Development and evaluation of a simple and effective prediction approach for identifying those at high risk of dyslipidemia in rural adult residents. PloS one. 2012;7(8):e43834.
40. Podgorelec V, Kokol P, Stiglic B, Rozman IJJoms. Decision trees: an overview and their use in medicine. Journal of medical systems. 2002;26:445-63.
41. Zhu Z, Feng T, Huang Y, Liu X, Lei H, Li G, et al. Excessive physical activity duration may be a risk factor for hypertension in young and middle-aged populations. Medicine. 2019;98(18).
42. Haskell WL, Lee I-M, Pate RR, Powell KE, Blair SN, Franklin BA, et al. Physical activity and public health: updated recommendation for adults from the American College of Sports Medicine and the American Heart Association. Circulation. 2007;116(9):1081.
43. Park S, Rink LD, Wallace JPJJoh. Accumulation of physical activity leads to a greater blood pressure reduction than a single continuous session, in prehypertension. Journal of hypertension. 2006;24(9):1761-70.
44. He FJ, MacGregor GAJJohh. A comprehensive review on salt and health and current experience of worldwide salt reduction programmes. Journal of human hypertension. 2009;23(6):363-84.
45. Wiinberg N, Høegholm A, Christensen HR, Bang LE, Mikkelsen KL, Nielsen PE, et al. 24-h ambulatory blood pressure in 352 normal Danish subjects, related to age and gender. American journal of hypertension. 1995;8(10):978-86.
46. Khoury S, Yavows SA, O’Brien TK, Sowers JR. Ambulatory blood pressure monitoring in a nonacademic setting: effects of age and sex. American journal of hypertension. 1992;5(9):616-23.
47. Staessen J, Fagard R, Lijnen P, Thijs L, Van Hoof R, Amery. Reference values for ambulatory blood pressure: a meta-analysis. Journal of hypertension Supplement: official journal of the International Society of Hypertension 1990;8(6):S57-64.
48. Burt VL, Whelton P, Roccella EJ, Brown C, Cutler JA, Higgins M, et al. Prevalence of hypertension in the US adult population: results from the Third National Health and Nutrition Examination Survey, 1988-1991. Hypertension. 1995;25(3):305-13.
49. Almoosawi S, Prynne CJ, Hardy R, 97 Vol 10 No 1 (2024) Mining Hypertension Predictors using Decision Tree ... Rezaianzadeh A et al. Stephen AM. Time-of-day of energy intake: association with hypertension and blood pressure 10 years later in the 1946 British Birth Cohort. Journal of hypertension. 2013;31(5):882-92.
50. Ferrannini E, Cushman WC. Diabetes and hypertension: the bad companions. The Lancet. 2012;380(9841):601-10.
51. Unger T, Borghi C, Charchar F, Khan NA, Poulter NR, Prabhakaran D, et al. 2020 International Society of Hypertension global hypertension practice guidelines. J Hypertension. 2020;75(6):1334-57.
52. Egan BM, Li J, Qanungo S, Wolfman TE. Blood pressure and cholesterol control in hypertensive hypercholesterolemic patients: national health and nutrition examination surveys 1988–2010. Circulation. 2013;128(1):29-41.
53. Bonaa K. Association between blood pressure and serum lipids in a population. The Tromso Study. Circulation. 1991;83:1305-14.
54. Li L, Wang Y, Cao W, Xu F, Cao J. Longitudinal studies of blood pressure in children. Asia Pacific Journal of Public Health. 1995;8(2):130-3.
55. Wakabayashi IJMS, Disorders R. Associations of blood lipid-related indices with blood pressure and pulse pressure in middle-aged men. Metabolic Syndrome. 2015;13(1):22-8.
56. Cho K-H, Park H-J, Kim J-R, health p. Decrease in serum HDL-C level is associated with elevation of blood pressure: correlation analysis from the Korean National Health and nutrition examination survey 2017. International journal of environmental research. 2020;17(3):1101.
57. HUGHES K, Leong W, Sothy S, Lun K, Yeo P. Relationships between cigarette smoking, blood pressure and serum lipids in the Singapore general population. international journal of epidemiology. 1993;22(4):637-43.
58. Yan Z, Bi-Rong D, Hui W, Chang Quan H. Serum lipid/lipoprotein and arterial blood pressure among Chinese nonagenarians/ centenarians. Blood Pressure. 2011;20(5):296- 302
Files | ||
Issue | Vol 10 No 1 (2024) | |
Section | Articles | |
DOI | https://doi.org/10.18502/jbe.v10i1.17155 | |
Keywords | ||
Decision tree Hypertension Kharameh cohort |
Rights and permissions | |
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. |