Original Article

Comparing of Data Mining Techniques for Predicting in-Hospital Mortality Among Patients with COVID-19


Introduction: The COVID-19 epidemic is currently fronting the worldwide health care systems with many qualms and unexpected challenges in medical decision-making and the effective sharing of medical resources. Machine Learning (ML)-based prediction models can be potentially advantageous to overcome these uncertainties.

Objective: This study aims to train several ML algorithms to predict the COVID-19 in-hospital mortality and compare their performance to choose the best performing algorithm. Finally, the contributing factors scored using some feature selection methods. 

Material and Methods: Using a single-center registry, we studied the records of 1353 confirmed COVID19 hospitalized patients from Ayatollah Taleghani hospital, Abadan city, Iran. We applied six feature scoring techniques and nine well-known ML algorithms. To evaluate the models’ performances, the metrics derived from the confusion matrix calculated. 

Results: The study participants were 1353 patients, the male sex found to be higher than the women (742 vs. 611), and the median age was 57.25 (interquartile 18-100). After feature scoring, out of 54 variables, absolute neutrophil/lymphocyte count and loss of taste and smell were found the top three predictors. On the other hand, platelet count, magnesium, and headache gained the lowest importance for predicting the COVID-19 mortality. Experimental results indicated that the Bayesian network algorithm with an accuracy of 89.31% and a sensitivity of 64.2 % has been more successful in predicting mortality. 

Conclusion: ML provides a reasonable level of accuracy in predicting. So, using the ML-based prediction models facilitate more responsive health systems and would be beneficial for timely identification of vulnerable patients to inform appropriate judgment by the health care providers. Abbreviation: Coronavirus Disease 2019 (COVID‐19), World Health Organization (WHO), Machine Learning (ML), Artificial Intelligence (AI), Multilayer Perceptron (MLP), Support Vector Machine (SVM), Locally Weighted Learning (LWL), Clinical Decision Support System (CDSS) 

1. Peeri NC, Shrestha N, Rahman MS, Zaki R, Tan Z, Bibi S, et al. The SARS, MERS and novel coronavirus (COVID-19) epidemics, the newest and biggest global health threats: what lessons have we learned? International journal of epidemiology. 2020.
2. Shi H, Han X, Jiang N, Cao Y, Alwalid O, Gu J, et al. Radiological findings from 81 patients with COVID-19 pneumonia in Wuhan, China: a descriptive study. The Lancet Infectious Diseases. 2020.
3. Albahri A, Hamid RA. Role of biological Data Mining and Machine Learning Techniques in Detecting and Diagnosing the Novel Coronavirus (COVID 19): A Systematic Review. Journal of Medical Systems. 2020; 44(7):1-11. PMID: 32451808. 4. Zhao Z, Chen A, Hou W, Graham JM, Li H, Richman PS, et al. Prediction model and risk scores of ICU admission and mortality in COVID-19. PloS one. 2020;15(7):e0236618.
5. Hu H, Yao N, Qiu Y. Comparing Rapid Scoring Systems in Mortality Prediction of Critically Ill Patients With Novel Coronavirus Disease. Academic Emergency Medicine. 2020;27(6):461-8.
6. Jamshidi E, Asgary A, Tavakoli N, Zali A, Dastan F, Daaee A, et al. Symptom Prediction and Mortality Risk Calculation forCOVID-19 Using Machine Learning. medRxiv. 2021.
7. Liu Y, Wang Z, Ren J, Tian Y, Zhou M, Zhou T, et al. A COVID-19 Risk Assessment Decision Support System for General Practitioners: Design and Development Study. Journal of medical Internet research. 2020;22(6):e19786.
8. Alom MZ, Rahman M, Nasrin MS, Taha TM, Asari VK. COVID_MTNet: COVID-19 Detection with Multi-Task Deep Learning Approaches. arXiv preprint arXiv:200403747. 2020.
9. Bansal A, Padappayil RP, Garg C, Singal A, Gupta M, Klein A. Utility of Artificial Intelligence Amidst the COVID 19 Pandemic: A Review. Journal of Medical Systems. 2020;44(9).
10. Lai C-C, Shih T-P, Ko W-C, Tang H J, Hsueh P-R. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and coronavirus disease-2019 (COVID-19): The epidemic and the challenges. International Journal of Antimicrobial Agents. 2020;55(3):105924.
11. Hussain A, Bhowmik B, do Vale Moreira NC. COVID-19 and diabetes: Knowledge in progress. Diabetes Research and Clinical Practice. 2020;162.
12. Moujaess E, Kourie HR, Ghosn M. Cancer patients and research during COVID 19 pandemic: A systematic review of current evidence. Critical Reviews in Oncology/Hematology. 2020;150:102972.
13. Zheng Y, Zhu Y, Ji M, Wang R, Liu X, Zhang M, et al. A Learning-Based Model to Evaluate Hospitalization Priority in COVID-19 Pandemics. Patterns. 2020;1(6):100092.
14. Yan L, Zhang H-T, Goncalves J, Xiao Y, Wang M, Guo Y, et al. An interpretable mortality prediction model for COVID-19 patients. Nature Machine Intelligence. 2020:1-6.
15. Malki Z, Atlam E-S, Hassanien AE, Dagnew G, Elhosseini MA, Gad I. Association between weather data and COVID-19 pandemic predicting mortality rate: Machine learning approaches. Chaos, Solitons & Fractals. 2020;138:110137.
16. Shanbehzadeh M, Nopour R, Kazemi-Arpanahi H. Comparison of Four Data Mining Algorithms for Predicting Colorectal Cancer Risk. Journal of Advances in Medical and Biomedical Research.29(133):100-8.
17. Hernandez-Suarez DF, Ranka S, Kim Y, Latib A, Wiley J, Lopez-Candales A, et al. Machine-learning-based in-hospital mortality prediction for transcatheter mitral valve repair in the United States. Cardiovascular Revascularization Medicine. 2020.
18. Shanbehzadeh M, Nopour R, Kazemi-Arpanahi H. Comparison of Four Data Mining Algorithms for Predicting Colorectal Cancer Risk. Journal of Advances in Medical and Biomedical Research. 2021;29(133):100-8.
19. Coleman BC, Fodeh S, Lisi AJ, Goulet JL, Corcoran KL, Bathulapalli H, et al. Exploring supervised machine learning approaches to predicting Veterans Health Administration chiropractic service utilization. Chiropractic & manual therapies. 2020;28(1):47.
20. Gao Y, Cai G-Y, Fang W, Li H-Y, Wang S-Y, Chen L, et al. Machine learning based early warning system enables accurate mortality risk prediction for COVID-19. Nature communications. 2020;11(1):1-10.
21. Ryan L, Lam C, Mataraso S, Allen A, Green-Saxena A, Pellegrini E, et al. Mortality prediction model for the triage of COVID-19, pneumonia, and mechanically ventilated ICU patients: a retrospective study. Annals of Medicine and Surgery. 2020;59:207-16.
22. Sinha S. Reproducibility of parameter learning with missing observations in naive Wnt Bayesian network trained on colorectal cancer samples and doxycycline-treated cell lines. Molecular bioSystems. 2015;11(7):1802-19.
23. Tian XW, Lim JS. Interactive Naive Bayesian network: A new approach of constructing gene-gene interaction network for cancer classification. Bio-medical materials and engineering. 2015;26 Suppl 1:S1929-36.
24. Golpour P, Ghayour-Mobarhan M, Saki A, Esmaily H, Taghipour A, Tajfard M, et al. Comparison of Support Vector Machine, Naïve Bayes and Logistic Regression for Assessing the Necessity for Coronary Angiography. International journal of environmental research and public health. 2020;17(18).
25. Huang S, Cai N, Pacheco PP, Narrandes S, Wang Y, Xu W. Applications of Support Vector Machine (SVM) Learning in Cancer Genomics. Cancer genomics & proteomics. 2018;15(1):41-51.
26. Lorencin I, Anđelić N, Španjol J, Car Z. Using multi-layer perceptron with Laplacian edge detector for bladder cancer diagnosis. Artificial intelligence in medicine. 2020;102:101746.
27. Talebi N, Nasrabadi AM, Mohammad-Rezazadeh I. Estimation of effective connectivity using multi-layer perceptron artificial neural network. Cognitive neurodynamics. 2018;12(1):21-42.
28. Li Q, Doi K. Analysis and minimization of overtraining effect in rule based classifiers for computer-aided diagnosis. Medical physics. 2006;33(2):320- 8.
29. Berhane TM, Lane CR, Wu Q, Autrey BC, Anenkhonov OA, Chepinoga VV, et al. Decision-Tree, Rule-Based, and Random Forest Classification of High-Resolution Multispectral Imagery for Wetland Mapping and Inventory. Remote sensing. 2018;10(4):580.
30. Feng M, Loy LY, Zhang F, Zhang Z, Vellaisamy K, Chin PL, et al. Go green! Reusing brain monitoring data containing missing values: a feasibility study with traumatic brain injury patients. Acta neurochirurgica Supplement. 2012;114:51-9.
31. Esmaily H, Tayefi M, Doosti H, Ghayour-Mobarhan M, Nezami H, Amirabadizadeh A. A Comparison between Decision Tree and Random Forest in Determining the Risk Factors Associated with Type 2 Diabetes. Journal of research in health sciences. 2018;18(2):e00412.
32. Pei D, Yang T, Zhang C. Estimation of Diabetes in a High-Risk Adult Chinese Population Using J48 Decision Tree Model. Diabetes, metabolic syndrome and obesity : targets and therapy. 2020;13:4621-30.
33. Vaid A, Jaladanki SK, Xu J, Teng S, Kumar A, Lee S, et al. Federated Learning of Electronic Health Records to Improve Mortality Prediction in Hospitalized Patients With COVID-19: Machine Learning Approach. JMIR medical informatics. 2021;9(1):e24207.
34. Aggarwal D, Bali V, Mittal S. An insight into machine learning techniques for Predictive Analysis and Feature Selection. International Journal of Innovative Technology and Exploring Engineering. 2019;8:342-9.
35. Brownlee J. Data Preparation for Machine Learning: Data Cleaning, Feature Selection, and Data Transforms in Python: Machine Learning Mastery; 2020.
36. Wu G, Yang P, Xie Y, Woodruff HC, Rao X, Guiot J, et al. Development of a clinical decision support system for severity risk prediction and triage of COVID-19 patients at hospital admission: an international multicentre study. European Respiratory Journal. 2020;56(2).
37. Hernandez-Suarez DF, Ranka S, Kim Y, Latib A, Wiley J, Lopez-Candales A, et al. Machine-learning-based in-hospital mortality prediction for transcatheter mitral valve repair in the United States. Cardiovascular Revascularization Medicine. 2021;22:22-8.
38. Subramani P, K S, B KR, R S, B DP. Prediction of muscular paralysis disease based on hybrid feature extraction with machine learning technique for COVID-19 and post-COVID-19 patients. Personal and ubiquitous computing. 2021:1-14.
39. Sun L, Mo Z, Yan F, Xia L, Shan F, Ding Z, et al. Adaptive Feature Selection Guided Deep Forest for COVID-19 Classification With Chest CT. IEEE journal of biomedical and health informatics. 2020;24(10):2798-805.
40. Allenbach Y, Saadoun D, Maalouf G, Vieira M, Hellio A, Boddaert J, et al. Development of a multivariate prediction model of intensive care unit transfer or death: A French prospective cohort study of hospitalized COVID-19 patients. PloS one. 2020;15(10):e0240711.
41. Assaf D, Gutman Ya, Neuman Y, Segal G, Amit S, Gefen-Halevi S, et al. Utilization of machine-learning models to accurately predict the risk for critical COVID-19. Internal and emergency medicine. 2020;15(8):1435-43.
42. Das AK, Mishra S, Gopalan SS. Predicting CoVID-19 community mortality risk using machine learning and development of an online prognostic tool. PeerJ. 2020;8:e10083.
43. Yadaw AS, Li Y-c, Bose S, Iyengar R, Bunyavanich S, Pandey G. Clinical features of COVID-19 mortality: development and validation of a clinical prediction model. The Lancet Digital Health. 2020;2(10):e516-e25.
44. Zhang Y, Xin Y, Li Q, Ma J, Li S, Lv X, et al. Empirical study of seven data mining algorithms on different characteristics of datasets for biomedical classification applications. Biomedical engineering online. 2017;16(1):125.
45. Zhou Y, He Y, Yang H, Yu H, Wang T, Chen Z, et al. Exploiting an early warning Nomogram for predicting the risk of ICU admission in patients with COVID-19: a multi-center study in China. Scandinavian journal of trauma, resuscitation and emergency medicine. 2020;28(1):1-13.
46. Booth AL, Abels E, McCaffrey P. Development of a prognostic model for mortality in COVID-19 infection using machine learning. Modern Pathology. 2020:1-10.
47. Pan P, Li Y, Xiao Y, Han B, Su L, Su M, et al. Prognostic Assessment of COVID 19 in the Intensive Care Unit by Machine Learning Methods: Model Development and Validation. Journal of medical Internet research, 2020, 22.11: e23128.
48. Chin V, Samia NI, Marchant R, Rosen O, Ioannidis JP, Tanner MA, et al. A case study in model failure? COVID-19 daily deaths and ICU bed utilisation predictions in New York State. European Journal of Epidemiology. 2020;35(8):733-42.
49. Wynants L, Van Calster B, Collins GS, Riley RD, Heinze G, Schuit E, et al. Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal. bmj. 2020;369.
50. Agieb R. Machine learning models for the prediction the necessity of resorting to icu of covid-19 patients. International Journal of Advanced Trends in Computer Science and Engineering. 2020:6980-4. 51. East A, Ray S, Pope R, Cortina-Borja M, Sebire NJ. 45 Predicting long length of stay in a paediatric intensive care unit using machine learning. BMJ Publishing Group Ltd; 2020.
52. Bath C, Heger U, Petrovsky N, Senanayake S, Frydenberg J. New vaccine:: Australian scientists are beginning work on a vaccine that specifically targets the mutant strains of COVID-19 found to be more contagious and potentially deadlier than previous variants. 2021.
53. Poirier C, Luo W, Majumder MS, Liu D, Mandl KD, Mooring TA, et al. The role of environmental factors on transmission rates of the COVID-19 outbreak: an initial assessment in two spatial scales. Scientific reports. 2020;10(1):1-11.
54. Shipe ME, Deppen SA, Farjah F, Grogan EL. Developing prediction models for clinical use using logistic regression: an overview. Journal of thoracic disease. 2019;11(Suppl 4):S574
IssueVol 7 No 2 (2021) QRcode
SectionOriginal Article(s)
DOI https://doi.org/10.18502/jbe.v7i2.6725
COVID‐19 Coronavirus Artificial intelligence Machine learning Mortality

Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
How to Cite
Shanbehzadeh M, Orooji A, Kazemi-Arpanahi H. Comparing of Data Mining Techniques for Predicting in-Hospital Mortality Among Patients with COVID-19. JBE. 2021;7(2):154-173.