Impact of the power of adaptive weight on penalized logistic regression: Application to cancer classification
Abstract
Background: Hybrid of the high-dimensional sparse data and multicollinearity problems can cause instabilities in classification models when applying them to new datasets. The Lasso, or Least Absolute Shrinkage and Selection Operator, is popularly used in machine-learning algorithm. Despite its computational feasibility for high-dimensional data, this method has certain drawbacks. Consequently, the adaptive Lasso was developed to solve these problems. Power of adaptive weight for this estimator is one of the important parameters. Therefore, we concentrate on the power of adaptive weight for the penalty functions. This study aimed to compare the impact of the power of adaptive weight on penalized logistic regression under high-dimensional sparse data with multicollinearity.
Methods: A penalized approaches were used to apply the variable selection and parameter estimates. The Monte Carlo simulation was performed using 50 and 1000 independent variables and sample size equal to 30/40. Degree of correlation was set to 0.1, 0.3, 0.5, 0.75, 0.85, and 0.95. Performance of the power of adaptive weight on penalized approaches was evaluated in term of the mean of the predicted mean squared error for simulation study and the classification accuracy of machine-learning model for real-data applications.
Results: The results presented that the higher-order of the adaptive Lasso approach performed best under very high-dimensional sparse data with multicollinearity when the initial weight was determined using a ridge estimator. However, in the case of high-dimensional sparse data with multicollinearity, the square root of the adaptive Lasso together with the initial weight using Lasso was the best option.
Conclusion: Our finding showed that the power of adaptive weight on penalty function and the initial weight can affect certain the classification accuracy of machine-learning model. In practice, if choosing these parameters are appropriate, it produces models that have good performance.
2. Sudjai N, Duangsaphon M. Liu-type logistic regression coefficient estimation with multicollinearity using the bootstrapping method. Science, Engineering and Health Studies. 2020;14(3):203-14.
3. Sudjai N, Siriwanarangsun P, Lektrakul N, Saiviroonporn P, Maungsomboon S, Phimolsarnti R, et al. Tumor-to-bone distance and radiomic features on MRI distinguish intramuscular lipomas from well-differentiated liposarcomas. J Orthop Surg Res. 2023;18(1):255.
4. Sudjai N, Siriwanarangsun P, Lektrakul N, Saiviroonporn P, Maungsomboon S, Phimolsarnti R, et al. Robustness of radiomic features: two-dimensional versus three-dimensional MRI-based feature reproducibility in lipomatous soft-tissue tumors. Diagnostics. 2023;13(2):258.
5. Hosmer DW, Lemeshow SJ. Applied logistic regression. 3 ed. New Jersey: Wiley; 2013.
6. Kleinbaum DG, Klein M. Logistic regression: a self-learning text. 3rd ed. New York: Springer; 2010.
7. Senaviratna NAMR, Cooray TMJA. Multicollinearity in binary logistic regression model. Theory and Practice of Mathematics and Computer Science. 2021;8: p. 11-9.
8. Cherkassky V, Mulier F. Learning from data: concepts, theory, and methods. 2nd ed. New Jersey: John Wiley and Sons; 2006.
9. Brimacombe M. High-dimensional data and linear models: a review. Open Access Med Stat. 2014;4:17-27.
10. Belsley DA, Kuh E, Welsch RE. Regression diagnostics: Identifying influential data and sources of collinearity. New York: John Wiley & Sons; 1980.
11. Kastrin A, Peterlin B. Rasch-based high-dimensionality data reduction and class prediction with applications to microarray gene expression data. Expert Syst Appl. 2010;37:5178-85.
12. Pavlou M, Ambler G, Seaman S, De Iorio M, Omar RZ. Review and evaluation of penalised regression methods for risk prediction in low-dimensional data with few events. Stat Med. 2016;35(7):1159-77.
13. Hosseinnataj A, Bahrampour A, Baneshi M, Zolala F, Nikbakht R, Torabi M, et al. Penalized Lasso methods in health data: application to trauma and influenza data of Kerman. Journal of Kerman University of Medical Sciences. 2019;26(6):440-9.
14. Hoerl AE, Kennard RW. Ridge regression: biased estimation for nonorthogonal problems. Technometrics. 1970;12(1):55-67.
15. Tibshirani R. Regression shrinkage and selection via the Lasso. J R Stat Soc Series B Stat Methodol. 1996;58(1):267-88.
16. Zou H, Hastie T. Regularization and variable selection via the elastic Net. J R Stat Soc Series B Stat Methodol. 2005;67(2):301- 20.
17. Zou H. The adaptive Lasso and Its oracle properties. J Am Stat Assoc. 2006;101(476):1418-29.
18. Zou H, Zhang HH. On the adaptive elastic-net with a diverging number of parameters. Ann Stat. 2009;37(4):1733-51.
19. Araveeporn A. The higher-order of adaptive Lasso and elastic net methods for classification on high dimensional data. Mathematics. 2021;9(10):1091.
20. Sudjai N, Duangsaphon M, Chandhanayingyong C. Adaptive elastic net on high-dimensional sparse data with multicollinearity: Application to lipomatous tumor classification. International Journal of Statistics in Medical Research. 2024;13:30-40.
21. Foreman SC, Llorián-Salvador O, David DE, Rösner VKN, Rischewski JF, Feuerriegel GC, et al. Development and evaluation of MR-Based radiogenomic models to differentiate atypical lipomatous tumors from lipomas. Cancers (Basel). 2023;15(7):2150.
22. Piffoux M, Jacquemin J, Pétéra M, Durand S, Abila A, Centeno D, et al. Metabolomic prediction of breast cancer treatment induced neurological and metabolic toxicities. Clin Cancer Res. 2024;30(20):4654-66.
23. Wang L, Zhao Z, Luo Y, Yu H, Wu S, Ren X, et al. Classifying 2-year recurrence in patients with dlbcl using clinical variables with imbalanced data and machine learning methods. Comput Methods Programs Biomed. 2020;196:105567.
24. Efron B, Hastie T, Johnstone I, Tibshirani R. Least angle regression. Ann Stat. 2004;32(2):407-51.
25. Hastie T, Tibshirani T, Friedman JB. The Elements of statistical learning: data mining inference and prediction. 2nd ed. Berlin/Heidelberg: Springer; 2009.
26. James G, Witten D, Hastie T, Tibshirani R. An Introduction to statistical learning with applications in R. New York: Springer; 2013.
27. Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc. 2001;96(456):1348-60.
28. Kamalapathy PN, Ramkumar DB, Karhade AV, Kelly S, Raskin K, Schwab J, et al. Development of machine learning model algorithm for prediction of 5-year soft tissue myxoid liposarcoma survival. J Surg Oncol 2021;123(7):1610-7.
29. Kamalapathy PN, Gonzalez MR, de Groot TM, Ramkumar D, Raskin KA, Ashkani- Esfahani S, et al. Prediction of 5-year survival in soft tissue leiomyosarcoma using a machine learning model algorithm. J Surg Oncol 2024;129(3):531-6.
30. Toğaçar M, Ergen B, Cömert Z. Application of breast cancer diagnosis based on a combination of convolutional neural networks, ridge regression and linear discriminant analysis using invasive breast cancer images processed with autoencoders. Med Hypotheses. 2020;135:109503.
31. Hardin J, Garcia SR, Golan D. A method for generating realistic correlation matrices. Ann Appl Stat. 2013;7(3):1733-62, 30.
32. Bottmer L, Croux C, Wilms I. Sparse regression for large data sets with outliers. Eur J Oper Res. 2022;297(2):782-94.
33. Rainio O, Teuho J, Klén R. Evaluation metrics and statistical tests for machine learning. Sci Rep. 2024;14(1):6086.
34. Tohka J, van Gils M. Evaluation of machine learning algorithms for health and wellness applications: A tutorial. Comput Biol Med. 2021;132:104324.
Files | ||
Issue | Vol 10 No 3 (2024) | |
Section | Articles | |
Keywords | ||
High-dimensional sparse data Machine-learning Multicollinearity Penalized logistic regression Penalty function Power of adaptive weight |
Rights and permissions | |
![]() |
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. |