Original Article

Multiclass Response Feature Selection and Cancer Tumour Classification With Support Vector Machine

Abstract

Background & Aim: In this study, efficient Support Vector Machine (SVM) algorithm for feature selection and classification of multi-category tumour classes of biological samples using gene expression profiles was proposed.
Methods: Feature selection interface of the algorithm employed the F-statistic of the ANOVA–like testing scheme at some chosen family-wise-error-rate which ensured efficient detection of false-positive genes. The selected gene subsets using the above method were further screened for optimality using the Misclassification Error Rates yielded by each of them and their combinations in a sequential selection manner. In a 10-fold cross-validation, the optimal values of the SVM parameters with appropriate kernel were determined for tissue sample classification using one-versus-all approach. The entire data matrix was randomly partitioned into 95% training set to train the SVM classifier and 5% test set to evaluate the predictive performance of the classifier over 1,000 Monte-Carlo cross-validation runs. Published microarray breast cancer dataset with five clinical endpoints was employed to validate the results from the simulation studies.
Results: Results from Monte-Carlo study showed excellent performance of the SVM classifier with higher prediction accuracy of the tissue samples based on the few gene biomarkers selected by the proposed feature selection method.
Conclusion: SVM could be considered as a classification of multi-category tumour classes of biological samples using gene expression profiles.

1. Banjoko A.W., Yahya W. B., Garba M. K., Olaniran O. R., Olorede K. O., Dauda K. A., Efficient Support Vector Machine Classification of Diffuse large B-Cell Lymphoma and Follicular Lymphoma mRNA Tissue Samples, Annals. Computer Science Series, 13, 2015a, 69 – 79.
2. Yahya W. B., Genes selection and Tumour Classification in Cancer Research: A new approach, Säbruck, Germany: Lambert Academic Publishing, 2012.
3. Yahya W. B., Aremu G. T., Garba M. K., Multiclass Sequential Feature Selection and Classification Method for Gene Expression Data, Journal of Applied Science and Technology, 20 (1&2), 2015, 50 – 61.
4. Witold R. R., Rudnicki, Mariusz W., Wiesław P., All Relevant Feature Selection Methods and Applications, Studies in Computational Intelligence, Springer, 584, 2015, 11 – 28.
5. Vapnik V. N., The Nature of Statistical Learning Theory, Springer-Verlag, New York, 1995.
6. Cristianini N., Shawe-Taylor J., An introduction to Support Vector Machines, Cambridge University Press, United Kingdom, 2012.
7. James G., Witten D., Hastie T., Tibshirani R., An Introduction to Statistical Learning with Applications in R, Springer Science + Business Media, New York, 2013.
8. Cichosz P., Data mining algorithms explained using R, John Wiley & Sons, New York., 2015.
9. Sørlie T., Perou C. M., Tibshirani R., Aas T., Geisler S., Johnsen H., Hastie T., Eisenh M. B., van de Rijn M., Jeffrey S. S., Thorsen T., Quist H., Matese J. C., Brown P. O., Botstein D., Lønning P. E., Anne-LiseBørresen-Dale, Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications”, Proceeding of the National Academy of Sciences of the United State of America (PNAS), 98, 2001, 10869 – 10874.
10. Welch B. L., On the comparison of several mean values: An alternative approach. Biometrika, 1951, 38, 330–336.
11. Sidak Z. K., Rectangular Confidence Regions for the Means of Multivariate Normal Distributions, Journal of the American Statistical Association, 62, 1967, 626–633.
12. Banjoko A.W., Yahya W. B., Garba M. K., Efficient Support Vector Machine Method for Tissue Samples Classification in Colon Cancer Genomic data, Proceedings of the 34th Annual Conference of The Nigeria Mathematical Society, Nigeria, 2015b.
13. Banjoko A.W., Yahya W. B., Garba M. K., Support Vector Machine for Feature Selection and Classification of Small Node–Negative Breast Carcinomas, Proceeding of the 3rd International Conference of the U6 Consortium, Nigeria, 2015c.
14. Banjoko A.W., Yahya W. B., Garba M. K., Efficient Support Vector Machine Classification of Diffuse Large B-Cell Lymphoma and Follicular Lymphoma mRNA Tissue Samples, Proceedings of the14th Regional Scientific Conference of the International Biometric Society – group Nigeria, Nigeria, 2015d.
15. Hapfelmeier A., Yahya W. B., Rosenberg R., Ulm K., Predictive Modeling of gene Expression data. In: Handbook of Statistics in Clinical Oncology, Chapman and Hall/CRC,New York, 2012, 463-475.
16. Yahya W. B., Sequential dimension reduction and prediction methods with high dimensional
microarray data, Ph.D. Thesis, Ludwig Maximilians-Universität, München, Germany, 2009.
17. Yahya W. B., Oladiipo M. O., Jolayemi E. T., A fast algorithm to construct neural networks classification models with high-dimensional genomic data, Annals. Computer Science Series, 10, 2012, 39- 58.
18. Yahya W. B., Rosenberg R., Ulm K., Microarray-based Classification of Histopathologic Responses of Locally Advanced Rectal Carcinomas To Neoadjuvant Radiochemotherapy Treatment, Turkiye Klinikleri Journal of Biostatistics, 6(1), 2014, 8- 23.
19. Yahya W. B., Ulm K., Ludwig F., Hapflemeir A., k-SS: A sequential feature selection and prediction method in microarray study, International Journal of Artificial Intelligence, 6, (S11), 2011, 19- 47.
20. Aremu G. T., Yahya W. B., Competing Algorithms For Microarray Based Multiclass Sequential Feature Selection and Classification, Proceedings of 4th International Science, Technology, Education, Arts, Management & Social Sciences (iSTEAMS) Research Nexus Conference, Nigeria, 2015, 675 – 682.
Files
IssueVol 5 No 2 (2019) QRcode
SectionOriginal Article(s)
DOI https://doi.org/10.18502/jbe.v5i2.2339
Keywords
Support Vector Machines Monte-Carlo Cross- Validation F-Statistic Family wise error rate Misclassification Error Rate

Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
How to Cite
1.
Banjoko A, Yahya W, Garba M. Multiclass Response Feature Selection and Cancer Tumour Classification With Support Vector Machine. JBE. 2020;5(2):91-104.