Review of Random Survival Forest method
Background: Over the past years, there has been a great deal of interest in applying statistical machine learning methods to survival analysis. Ensemble-based methods, especially random survival forest, have been developed in various fields, especially medical sciences, due to their high accuracy and non-parametric nature and applicability in high-dimensional data sets. This paper aims to provide a methodological review and how to use random survival forests in the analysis of right-censored survival data.
Method: We present a review article based on the latest research in the PubMed database on random survival forest model methodology.
Results: This article begins with an introduction to tree-based methods, ensemble algorithms, and random forest (RF) method, followed by random survival forest framework, bootstrapped data and out-of-bag (OOB) ensemble estimators, review of performance evaluation indicators, how to select important variables, and other advanced topics of random survival forests for time-to-event data.
Conclusion: When analyzing right-censored survival data with high-dimensional data, while the relationships between variables are complex and their interactions are taken into account, the nonparametric random survival forest (RSF) method determines important variables affecting survival times with high accuracy and speed and also does not need to test the restrictive assumptions.
2. Cox DR, Oakes D. Analysis of survival data. Vol 21 New York, Chapman & Hall/CRC, 1984.
3. Radespiel-Troger M, Rabenstein T, Schneider H.T, and Lausen B. Comparison of tree-based methods for prognostic stratification of survival data. Artificial Intelligence in Medicine, 2003, vol. 28, no. 3, 323–341.
4. Gordon L, Olshen RA. Tree-structured survival analysis. Cancer Treat Rep 1985, 69:1065–1069.
5. LeBlanc M, Crowley J. A review of tree-based prognostic models. Springer, 1995, 113–124.
6. Hothorn T, Lausen B, Benner A, Radespiel-Troger M. Bagging survival trees. Stat Med, 2004, 23:77–91.
7. Hothorn T, Buhlmann P, Dudoit S, Molinaro A, Laan MJ. Survival ensembles. Biostatistics, 2006, 7:355–373.
8. Breiman L. Bagging predictors. Machine Learning, 1996, 24 (2), 123–140.
9. Heagerty P. J, Lumley T, Pepe M. S. Time-dependent ROC curves for censored survival data and a diagnostic marker. Biometrics, 2000, vol. 56, no. 2, 337–344.
10. Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, 2001.
11. Breiman L, Friedman J, Olshen R, Stone C. Classification and Regression Trees. The Wadsworth Statistics/Probability Series, Belmont, CA, 1984.
12. Breiman L. Heuristics of instability and stabilization in model selection. Ann Stat, 1996b, 24(6), 2350-2383.
13. Berk R. A. An introduction to ensemble methods for data analysis. Sociological methods & research, 2006, 34(3), 263-295.
14. Breiman L. Random forests. Machine Learning, 2001, 45: 5-32.
15. Athey S, Imbens GW. The state of applied Econometrics – Causality and Policy Evaluation. ArXiv 2016.
16. Ho, T. K. Random Decision Forests. Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, 14–16 August 1995, 278–282.
17. Hothorn T, Lausen B. Double-bagging: Combining classifiers by bootstrap aggregation. Pattern Recognition, 2003, 36: 6, 3 3–1309.
18. Hothorn T, Lausen B, Benner A, Radespiel-Troeger M. Bagging Survival Trees. Statistics in medicine, 2004, 3: 77–91.
19. Hothorn T, Buhlmann P, Dudoit S, Olinaro A. Laan J. Survival Ensembles. Biostatistics, 2006, 7: 3, 355–373.
20. Ishwaran H, Kogalur UB, Blackstone EH, Lauer MS. Random Survival Forests. The Annals of Applied Statistics, 2008, 2(3), 841–860.
21. Ishwaran H, Kogalur UB, Chen X, Minn AJ. Random survival forests for high-dimensional data. Stat Anal Data Min 2011, 4:115-132.
22. Dietrich S, Floegel A, Boeing H, Schulze MB, Illig T, Pischon T, et al. Random Survival Forest in practice – a method for modelling high-dimensional metabolomics data in time to event analysis. In revision at International Journal of Epidemiology, 2016.
23. Datema FR, Moya A, Krause P, Back T, Willmes L, Langeveld T, et al. Novel Head and Neck Cancer Survival Analysis Approach: Random Survival Forests Versus Cox Proportional Hazards Regression. Head and Neck-Journal for the Sciences and Specialties of the Head and Neck, 2012, 34(1):50-8.
24. Hsich E, Gorodeski EZ, Blackstone EH, Ishwaran H, Lauer MS. Identifying Important Risk Factors for Survival in Patient With Systolic Heart Failure Using Random Survival Forests. Circulation-Cardiovascular Quality and Outcomes, 2011, 4(1):39-45.
25. Omurlu IK, Ture M, Tokatli F. The comparisons of random survival forests and Cox regression analysis with simulation and an application related to breast cancer. Expert Systems with Applications, 2009, 36(4).
26. Siroky DS. Navigating Random Forests and related advances in algorithmic modeling. Statist Surv, 2009, Vol3, pp 147-163.
27. Van der Schaaf A, Xu CJ, van Luijk P, Van't Veld AA, Langendijk JA, Schilstra C. Multivariate modeling of complications with data driven variable selection: guarding against overfitting and effects of data set size. Radiotherapy and oncology: journal of the European Society for Therapeutic Radiology and Oncology, 2012, 105(1):115-21.
28. Loh W Y. Split Selection Methods for Classification Trees. Statistica Sinica, 1997, Vol 7, 815-840.
29. Harrell FE, Califf RM, Pryor DB, Lee KL, Rosati RA. Evaluating the yield of medical tests. Journal of the American Medical Association, 1982, 247(18):2543-6.
30. Guyon I, Elisseeff A. An introduction to variable and feature selection. J Mach Learn Res. 2003, 3:1157-82.
31. Gentle J. E, Härdle W. K, Mori Y. Handbook of Computational Statistics - Concepts and Methods. Springer, 2012.
32. James G, Witten D, Hastie T, Tibshirani R. An Introduction to Statistical Learning- with Applications in R. Springer ,2013.
33. Ishwaran, H. Variable importance in binary regression trees and forests. Electron. J. Statist, 2007, 1, 519–537.
34. Ishwaran H, Gerds TA, Kogalur UB, Moore RD, Gange SJ, Lau BM. Random survival forests for competing risks. Biostatistics, 2014, 15:757–773.
35. Freund Y. Boosting a weak learning algorithm by majority. Inf Comput, 1995, 121:256-285.
36. Friedman J, Hastie T, Tibshirani R. Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann Stat, 2000, 28:337-407.
37. De Bin R. Boosting in Cox Regression: A Comparison between the Likelihood-Based and the Model-Based Approaches with Fo¬cus on the R-Packages CoxBoost and mboost. Technical Report No. 180, Munich: Department of Statistics, University of Mu¬nich, 2015.
|Issue||Vol 6 No 1 (2020)|
|Machine learning; Ensemble methods; Random survival forest; Surviva|
|Rights and permissions|
|This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.|