Comparison of Two Methods, Gradient Boosting and Extreme Gradient Boosting to Pre- dict Survival in Covid-19 Data
Abstract
Introduction: The present study discusses the importance of having a predictive method to determine the prognosis of patients with diseases like Covid-19. This method can assist physicians in making treatment decisions that improve survival rates and avoid unnecessary treatments. This research also highlights the importance of calibration, which is often overlooked in model evaluation. Without proper calibration, incorrect decisions can be made in disease treatment and preventive care. Therefore, the current study compares two highly accurate machine learning algorithms, Gradient boosting and Extreme gradient boosting, not only in terms of prediction accuracy but also in terms of model calibration and speed.
Methods: This study involved analyzing data from Covid-19 patients who were admitted to two hospitals in Mashhad city, Razavi Khorasan province, over a span of 18 months. The k-fold cross-validation method was employed on the training dataset (K=5) to conduct the study. The accuracy and calibration of two methods (Gradient boosting and Extreme gradient boosting) in predicting survival were compared using the Concordance Index and calibration.
Results: The Concordance Index values obtained for gradient boosting and Extreme gradient boosting models were 0.734 and 0.736, in the imbalanced and In the balanced data, the Concordance Index values were 0.893 for gradient boosting and 0.894 for Extreme gradient boosting. The surv.calib_beta index, the gradient boosting model had an estimated value of 0.59 in the imbalanced data and 0.66 in the balanced data. The Extreme gradient boosting model had an estimated value of 0.86 in the balanced data and 0.853 in the imbalanced data. The Extreme gradient boosting model was faster in the learning process compared to the gradient boosting model.
Conclusion: The Gradient boosting and Extreme gradient boosting models exhibited similar prediction accuracy and discrimination power, but the Extreme gradient boosting model demonstrated relatively good calibration compare to Gradient boosting model.
2. Kim DW, Lee S, Kwon S, Nam W, Cha I-H, Kim HJ. Deep learning-based survival prediction of oral cancer patients. Scientific Reports. 2019;9(1):6994
3. Kantidakis G, Putter H, Lancia C, Boer Jd, Braat AE, Fiocco M. Survival prediction models since liver transplantation
- comparisons between Cox models and machine learning techniques. BMC Medical Research Methodology. 2020;20(1):277.
4. Kirişci M. Comparison of artificial neural network and logistic regression model for factors affecting birth weight. SN Applied
Sciences. 2019;1(4):378.
5. Spooner A, Chen E, Sowmya A, Sachdev P, Kochan NA, Trollor J, et al. A comparison of machine learning methods for survival analysis of high-dimensional clinical data for dementia prediction. Scientific Reports. 2020;10(1):20410.
6. Uddin S, Khan A, Hossain ME, Moni MA. Comparing different supervised machine learning algorithms for disease prediction. BMC Medical Informatics and Decision Making. 2019;19(1):281.
7. Soave DM, Strug LJ. Testing Calibration of Cox Survival Models at Extremes of Event Risk. Frontiers in Genetics. 2018;9.
8. Chen Y, Jia Z, Mercola D, Xie X. A Gradient Boosting Algorithm for Survival Analysis via Direct Optimization of Concordance Index. Computational and Mathematical Methods in Medicine. 2013;2013:873595.
9. Brandon Greenwell BB, Jay Cunningham, GBM Developers gbm: Generalized Boosted Regression Models [Available from: https:// cran.r-project.org/web/packages/gbm/index. html.
10. Khandelwal N. A Brief Introduction to XGBoost: Towards Data Science; 2020 [updated Jul 7, 2020. Available from: https:// towardsdatascience.com/a-brief-introduction- to-xgboost-3eaee2e3e5d6.
11. Tewarisv U. Understanding L1 and L2 regularization for Deep Learning: Analytics Vidhya; 2021 [Available from: https://medium.com/analytics-vidhya/regularization- understanding-l1-and-l2-regularization-for- deep-learning-a7b9e4a409bf.
12. Tianqi Chen TH, Michael Benesty, Vadim Khotilovich, Yuan Tang, Hyunsu Cho, Kailong Chen, Rory Mitchell, Ignacio Cano, Tianyi Zhou, Mu Li, Junyuan Xie, Min Lin, Yifeng Geng, Yutian Li, Jiaming Yuan. xgboost: Extreme Gradient Boosting [Available from: https://cran.r-project.org/web/packages/ xgboost/index.html.
13. GitHub. Full survival curve estimation using xgboost.[Available from: https://rdrr.io/ github/IyarLin/survXgboost/.
14. Nicola Lunardon GM, Nicola Torelli. ROSE-package: ROSE: Random Over- Sampling Examples [Available from: https:// rdrr.io/cran/ROSE/man/ROSE-package.html.
15. Marc Becker ML, Jakob Richter, Bernd Bischl, Daniel Schalk. mlr3tuning: Hyperparameter Optimization for 'mlr3' [Available from: https://cran.r-project.org/ web/packages/mlr3tuning/index.html.
16. Team, R. D. C. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Retrieved from http://www.R-project. org.
17. Hu Y-J, Ku T-H, Jan R-H, Wang K, Tseng Y-C, Yang S-F. Decision tree-based learning to predict patient controlled analgesia consumption and readjustment. BMC Medical Informatics and Decision Making. 2012;12(1):131.
18. Vickers AJ, Cronin AM. Everything you always wanted to know about evaluating prediction models (but were too afraid to ask). Urology. 2010;76(6):1298-301.
19. Van Houwelingen, C. H (2000). “Validation, calibration, revision and combination of prognostic survival models.” Statistics in Medicine, 19(24), 3401-3415.
20. Van Calster B, McLernon DJ, van Smeden M, Wynants L, Steyerberg EW, Bossuyt P, et al. Calibration: the Achilles heel of predictive analytics. BMC Medicine. 2019;17(1):230.
21. Cohen JF, Korevaar DA, Altman DG, Bruns DE, Gatsonis CA, Hooft L, et al. STARD 2015 guidelines for reporting diagnostic accuracy studies: explanation and elaboration. BMJ Open. 2016;6: e012799. doi: 10.1136/ bmjopen-2016-012799
22. 43. Sounderajah V, Ashrafian H, Golub RM, Shetty S, De Fauw J, Hooft L, et al. Developing a reporting guideline for artificial intelligence-centred diagnostic test accuracy studies: the STARD-AI protocol.
23. Kleinbaum, D.G. and Klein, M. (2012) Survival Analysis: A Self-Learning Text. 3rd Edition, Springer, NewYork.
24. Goldstein M, Han X, Puli A, Perotte AJ, Ranganath R. X-CAL: Explicit Calibration for Survival Analysis. Adv Neural Inf Process Syst. 2020;33:18296-307.
25. Ayumi V, editor Pose-based human action recognition with Extreme Gradient Boosting. 2016 IEEE Student Conference on Research and Development (SCOReD); 2016 13-14 Dec. 2016.
26. Seto H, Oyama A, Kitora S, Toki H, Yamamoto R, Kotoku Ji, et al. Gradient boosting decision tree becomes more reliable than logistic regression in predicting probability for diabetes with big data. Scientific Reports. 2022;12(1):15889.
27. Hu X, Hu X, Yu Y, Wang J. Prediction model for gestational diabetes mellitus using the XGBoost machine learning algorithm. Front Endocrinol (Lausanne). 2023;14:1105062.
Files | ||
Issue | Vol 9 No 3 (2023) | |
Section | Original Article(s) | |
DOI | https://doi.org/10.18502/jbe.v9i3.15450 | |
Keywords | ||
Gradient boosting algorithm Extreme gradient boosting algorithm survival analysis Covid-19 |
Rights and permissions | |
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. |