A bias-variance trade-off in the prediction error estimation behavior in bootstrap methods for microarray leukemia classification
Background & Aim: The bootstrap is a method that resample from the original data set. There are the wide ranges of bootstrap application for estimating the prediction error rate. We compare some bootstrap methods for estimating prediction error in classification and choose the best method for the microarray leukemia classification.
Methods & Materials: The sample consist of n=38 patients with acute lymphoblastic leukemia (ALL) and acute myeloid leukemia (AML) with p=4120 genes that n<<p from an existing database. We carried out following steps. (1) Resample from the original sample. (2) Divide the sample to two sets, learning set and test set by 3-fold cross validation. (3) Train 1NN, CART and DLDA classifiers and compute its misclassification error by comparing the predicted class of the remaining samples with the true class. (4) Average the
errors on B bootstrap samples.
Results: Standard deviation, bias and MSE for comparing four bootstrap methods by three classifiers were computed. For choosing the best method, we assess a bias-variance tradeoff in the behavior of prediction error estimates. The 0.632+ BT is approximately un-bias and has small variability. However, the LOOBT procedure has big variability and is biased. Also we provide a table and some figures in the section 4.
Conclusion: The bias and variance of the prediction error rates have high variability in various bootstrap methods. Although the 0.632+ BT is approximately un-bias and has small variability, other resampling methods maybe are useful for the microarray classification in the different situations.
- Jiang W. and Simon R. A Comparison of Bootstrap Methods and Adjusted Bootstrap Approach for Estimating the Prediction Error in Microarray Classification. Statist. Med. 2007; 26:5320–34.
- Lusa, L. Radmacher, M. McShane LM. Shih JH. Wright GW. And Simon R. Appropriateness of some resampling-based inference procedures for assessing performance of prognostic classifiers derived from microarray data. Statist Med. 2007; 26:1102-13.
- Kohavi R. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. IJCAI, 1995. 1137–45 [Online]. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.48.529
- Simon, R. Diagnostic and Prognostic Prediction using Gene Expression Profiles in High-Dimensional Microarray Data. British Journal of Cancer 2003; 89:1599-604.
- Efron B. Estimating the error rate of a prediction rule: Improvement on cross-validation. J Am Stat Assoc. 1983; 78(382):316-31.
- Fu WJ, Carroll RJ, Wang S. Estimating misclassification error with small samples via Bootstrap Cross-Validation. Bioinformatics 2005; 21:1979-86.
- Duda, Ro. Hart, PE. Stork, DG: Pattern classification. John Wiley and Sons Lnc. 2001, ch. 9:483-86.
- Molinaro, AM. Simon, R. Pfeiffer, RM. Prediction Error Estimation: A Comparison of Resampling Methods. Bioinformatics 2005; 21:3301-07.
- Varma S, Simon R. Bias in Error Estimation When Using Cross-Validation for Model Selection. BMC Bioinformatics, 2006; 7:91.
- Chernick, MichaeL R. Bootstrap Methods: A Guide for Practitioners and Researchers. Wiley Series in Probability and Statistics, 2nd Ed.(2008).
- Golub TR. Slonim DK. Tamayo P. Huard C. Gaasenbeek M. Mesirov JP. Coller H. Loh ML. Downing JR. Caligiuri MA. Bloomfield CD. Lander ES. Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 1999: 286(5439) pp. 531-37.
- Yoo CK, Lee IB, Vanrolleghem PA. Interpreting pattern and analysis of acute leukemia gene expression data by multivariate fuzzy statistical analysis. Computers and chemical engineering, 2005; 29:1345-56.
- Dudoit S, Fridlyand J, Speed T. Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc 2002; 97;77-87.
- Efron B, Tibshirani R. An Introduction to the bootstrap. Chapmam & Hall: London, 1998.
- Dudoit, S., Fridlyand, J. Classification in microarray experiments. Statistical Analysis of Gene Expression Microarray Data. Chapman & Hall: London; (2003) 93-158.
- Efron B, Tibshirani RJ. Improvement on cross-validation: the 0.632+ bootstrap method. J Am Stat Assoc. 1997; 92:548–60.
- Refaeilzadeh P, Tang L and Liu H. Cross-Validation. In AAAI-07 Workshop on Evaluation Methods in Machine Learning II. Vancouver, Canada 2007, Pages 1-6.
- Ihaka R, Gentleman R. R: a language for data analysis and graphics. J Comput Graph Stat. 1996; 5:299-314.
- Vu T, Sima C, M Braga-Neto UM and Dougherty ER. Unbiased bootstrap error estimation for linear discriminant analysis. EURASIP Journal on Bioinformatics and Systems Biology 2014, 2014:15 http://bsb.eurasipjournals.com/content/2014/1/15, doi:10.1186/s13637-014-0015-0
- Dougherty ER, Sima C, Hua J, Hanczar B and Braga-Neto UM. Performance of Error Estimators for Classification. Current Bioinformatics, 2010; 5:53-67
- Huang LT. An integrated method for cancer classification and rule extraction from microarray data. Journal of biomedical science 2009; 16:25 doi:10.1186/1423-0127-16-25.
- Boulesteix A-L and Strobl C. Optimal classifier selection and negative bias in error rate estimation: an empirical study on high-dimensional prediction. BMC Medical Research Methodology 2009; 9:85 doi:10.1186/1471-2288-9-85.