Bounded multivariate contaminated normal mixture model with applications to skin cancer detection
Abstract
Background & Aim: In real-world datasets, outliers are a common occurrence that can have a significant impact on the accuracy and reliability of statistical analyses. Detecting these outliers and developing robust models to handle their presence is a crucial challenge in data analysis. For instance, natural images may have complex distributions of values due to environmental factors like noise and illumination, resulting in objects with overlapping regions and non-trivial contours that cannot be accurately described by Gaussian mixture models. In many real life applications, observed data always fall in bounded support regions. This leads to the idea of bounded support mixture models. Motivated by the aforementioned observations, we introduce a bounded multivariate cntaminated normal distribution for fitting data with non-Gaussian distributions, asymmetry, and bounded support which makes finite mixture models more robust to fitting, since rare observations are given less importance in calculations.
Methods & Materials: A family of finite mixtures of bounded multivariate contaminated normal distributions is introduced. The model is well-suited for computer vision and pattern recognition problems due to its heavily-tailed and bounded nature, providing flexibility in modeling data in the presence of outliers. A feasible expectation-maximization algorithm is developed to compute the maximum likelihood estimates of the model parameters using a selection mechanism.
Results: The proposed methodology is validated by conducting experiments on two real natural skin cancer images. We estimate the parameters by the proposed expectation-maximization algorithm. The obtained results shown that the proposed model showed that the proposed method has successfully enhanced accuracy in segmenting skin lesions.
Conclusion: The reliable model-based clustering using finite mixtures of bounded multivariate contaminated normal distributions is introduced. An expectation-maximization algorithm was created to estimate parameters, with closed-form expressions utilized at the E-step. Practical tests on images for skin cancer detection showed enhanced accuracy in delineating skin lesions.
1. Tukey JW. A survey of sampling from contaminated distributions. Contributions to Probability and Statistics. 1960:448-85.
2. Aitkin M, Wilson GT. Mixture models, outliers, and the EM algorithm. Technometrics. 1980;22(3):325-31.
3. Hedelin P, Skoglund J. Vector quantization based on Gaussian mixture models. IEEE Transactions on Speech and Audio Processing. 2000;8(4):385-401.
4. Lindblom J, Samuelsson J. Bounded support Gaussian mixture modeling of speech spectra. IEEE Transactions on Speech and Audio Processing. 2003;11(1):88-99.
5. Flecher C, Allard D, Naveau P. Truncated skew-normal distributions: moments, estimation by weighted moments and application to climatic data. Metron. 2010;68:331-45.
6. Nguyen TM, Wu QJ. Bounded asymmetrical student's-t mixture model. IEEE Transactions on Cybernetics. 2013;44(6):857- 69.
7. Xiong T, Yi Z, Zhang L. Grayscale image segmentation by spatially variant mixture model with student’s t-distribution. Multimedia Tools and Applications. 2014;72:167-89.
8. Sun J, Ji Z. Bounded asymmetric Gaussian mixture model with spatial constraint for image segmentation. In 2016 International Conference on Progress in Informatics and Computing. 2016;369-73. IEEE.
9. Bi H, Tang H, Shu HZ, Dillenseger JL. Bounded Rayleigh mixture model for ultrasound image segmentation. In Eighth 122 Vol 10 No 1 (2024) Bounded Multivariate Contaminated Normal Mixture Model ... Mahdavi A. International Conference on Graphic and Image Processing. 2017;10225:215-19. SPIE.
10. Punzo A, McNicholas PD. Parsimonious mixtures of multivariate contaminated normal distributions. Biometrical Journal. 2016;58(6):1506-37.
11. Mazza A, Punzo A. Mixtures of multivariate contaminated normal regression models. Statistical Papers. 2020;61(2):787- 822.
12. Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (methodological). 1977;39(1):1-22.
13. Atkinson SE. The performance of standard and hybrid EM algorithms for ML estimates of the normal mixture model with censoring. Journal of Statistical Computation and Simulation. 1992;44(1-2):105-15.
14. McLachlan GJ, Jones PN. Fitting mixture models to grouped and truncated data via the EM algorithm. Biometrics. 1988;1:571- 8.
15. Nguyen TM, Wu QJ, Zhang H. Bounded generalized Gaussian mixture model. Pattern Recognition. 2014;47(9):3132-42.
16. Azam M, Bouguila N. Bounded generalized Gaussian mixture model with ICA. Neural Processing Letters. 2019;49:1299-320.
17. Azam M, Bouguila N. Multivariate bounded support Laplace mixture model. Soft Computing. 2020;24(17):13239-68.
18. Lee G, Scott C. EM algorithms for multivariate Gaussian mixture models with truncated and censored data. Computational Statistics & Data Analysis. 2012;56(9):2816- 29.
19. Yu Q, Cao G, Shi H, Zhang Y, Fu P. EPLL image restoration with a bounded asymmetrical Student’s t mixture model. Journal of Visual Communication and Image Representation. 2022;2088:103611.
20. Mahdavi A, Amirzadeh V, Jamalizadeh A, Lin TI. Maximum likelihood estimation for scale-shape mixtures of flexible generalized skew normal distributions via selection representation. Computational Statistics. 2021;36:2201-30.
21. Mahdavi A, Amirzadeh V, Jamalizadeh A, Lin TI. A Multivariate flexible skew symmetric-normal distribution: Scale-shape mixtures and parameter estimation via selection representation. Symmetry. 2021;13(8):1343.
22. Meng XL, Rubin DB. Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika. 1993;80(2):267-78.
23. Liu C, Rubin DB. The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika. 1994;81(4):633-48.
24. Cabral CR, Lachos VH, Prates MO. Multivariate mixture modeling using skew-normal independent distributions. 123 Vol 10 No 1 (2024) Bounded Multivariate Contaminated Normal Mixture Model ... Mahdavi A. Computational Statistics & Data Analysis. 2012;56(1):126-42.
25. Prates MO, Lachos VH, Cabral CR. mixsmsn: Fitting finite mixture of scale mixture of skew-normal distributions. Journal of Statistical Software. 2013;54:1-20.
26. Hubert L, Arabie P. Comparing partitions. Journal of Classification. 1985;2:193-218
Files | ||
Issue | Vol 10 No 1 (2024) | |
Section | Articles | |
DOI | https://doi.org/10.18502/jbe.v10i1.17157 | |
Keywords | ||
ECME algorithm Mixture model Contaminated normal distribution Bounded distribution |
Rights and permissions | |
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. |