Random-Splitting Random Forest with Multiple Mixed-Data Covariates
Abstract
Background: The bagging (BG) and random forest (RF) are famous supervised statistical learning methods based on classification and regression trees. The BG and RF can deal with different types of responses such as categorical, continuous, etc. There are curves, time series, functional data, or observations that are related to each other based on their domain in many statistical applications. The RF methods are extended to some cases for functional data as covariates or responses in many pieces of literature. Among them, random-splitting is used to summarize the functional data to the multiple related summary statistics such as average, etc.
Methods: This research article extends this method and introduces the mixed data BG (MD-BG) and RF (MD-RF) algorithm for multiple functional and non-functional, or mixed and hybrid data, covariates and it calculates the variable importance plot (VIP) for each covariate.
Results: The main differences between MD-BG and MD-RF are in choosing the covariates that in the first, all covariates remain in the model but the second uses a random sample of covariates. The MD-RF helps to unmask the most important parts of functional covariates and the most important non-functional covariates.
Conclusions: We apply our methods on the two datasets of DTI and Tecator and compare their performances for continuous and categorical responses with the developed R package (“RSRF”) in the GitHub.
2. Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, inference, and prediction: Springer Science & Business Media; 2009.
3. Yu Y, Lambert D. Fitting trees to functional data, with an application to time-of-day patterns. Journal of Computational and graphical Statistics. 1999;8(4):749-62.
4. Febrero Bande M, Oviedo de la Fuente M. Statistical computing in functional data analysis: The R package fda. usc.
5. Gregorutti B, Michel B, Saint-Pierre P. Grouped variable importance with random forests and application to multiple functional data analysis. Computational Statistics & Data Analysis. 2015;90:15-35.
6. Nerini D, Ghattas B. Classifying densities using functional regression trees: Applications in oceanology. Computational Statistics & Data Analysis. 2007;51(10):4984-93.
7. Rahman R, Dhruba SR, Ghosh S, Pal R. Functional random forest with applications in dose-response predictions. Scientific reports. 2019;9(1):1-14.
8. Scornet E. On the asymptotics of random forests. Journal of Multivariate Analysis. 2016;146:72-83.
9. Capitaine L, Bigot J, Thiébaut R, Genuer R. Fréchet random forests for metric space valued regression with non euclidean predictors. 2020.
10. Möller A, Tutz G, Gertheiss J. Random forests for functional covariates. Journal of Chemometrics. 2016;30(12):715-25.
11. Pospisil T, Lee AB. (f) RFCDE: Random Forests for Conditional Density Estimation and Functional Data. arXiv preprint arXiv:190607177. 2019.
12. Silverman B, Ramsay JO. Functional Data Analysis. 2005.
13. Goldsmith J, Scheipl F, Huang L, Wrobel J, Gellar J, Harezlak J, et al. Refund: Regression with functional data. R package version 01-16. 2016;572.
14. Ferraty F, Vieu P. Nonparametric functional data analysis: theory and practice: Springer Science & Business Media; 2006.
15. Bande MF, de la Fuente MO, Galeano P, Nieto A, Garcia-Portugues E, de la Fuente MMO. Package ‘fda. usc’. 2020.
16. RColorBrewer S, Liaw MA. Package ‘randomForest’. University of California, Berkeley: Berkeley, CA, USA. 2018.
17. Gareth J, Daniela W, Trevor H, Robert T. An introduction to statistical learning: with applications in R: Spinger; 2013.
18. Ishwaran H, Kogalur UB, Kogalur MUB. Package ‘randomForestSRC’. breast. 2022;6:1.
19. Ciarleglio A, Petkova E, Ogden RT, Tarpey T. Treatment decisions based on scalar and functional baseline covariates. Biometrics. 2015;71(4):884-94.
20. Scheffler A, Telesca D, Li Q, Sugar CA, Distefano C, Jeste S, et al. Hybrid principal components analysis for region-referenced longitudinal functional EEG data. Biostatistics. 2020;21(1):139-57.
21. Aneiros-Pérez G, Vieu P. Semi-functional partial linear regression. Statistics & Probability Letters. 2006;76(11):1102-10.
Files | ||
Issue | Vol 9 No 1 (2023) | |
Section | Original Article(s) | |
DOI | https://doi.org/10.18502/jbe.v9i1.13974 | |
Keywords | ||
Bagging Functional data Random forest Random splitting Statistical learning |
Rights and permissions | |
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. |