Original Article

Predicting the categories of colon cancer using microarray data and nearest shrunken centroid

Abstract

Background  & Aim: It is very helpful to classify and predict the clinical  category  of a sample based  on  its gene  expression  profile.  This  study  was  conducted  to predict  tissues  of colorectal adenoma, adenocarcinoma,   and  paired  normal  in  colon  based  on  microarray  data  using  nearest shrunken centroid method.
Methods   &  Materials:   In  this  study,   the  colon   cancer   dataset   were   used   including,   18 adenocarcinoma,  4 colorectal  adenoma, and 22 paired normal colon samples with 2360 common gene  expression  measurements.  In order  to predict  categories  of  colon  cancer  was used  nearest shrunken centroid method. R software was used for data analysis.
Results: Based on our findings, performance of nearest shrunken centroid method was successful to reduce 2360 genes to a set of eleven genes containing rig, BIGH3, GLI3, Homo sapiens guanylin, p78, 54KDa, XBP-1, CO-029, desmin, MLC-2, and HMG-1. This method predicted three classes. It predicted two classes- colorectal adenoma and adenocarcinoma with error of zero and normal class with error of 4.5%.
Conclusion: Nearest shrunken centroid method succeeded to reduce several 1000 genes to 11 genes that were able to characterize colon tissue samples to one of the three classes of adenocarcinoma, colorectal adenoma and normal with 97.7% accuracy.

World Health Organization. Global cancer rates could increase by 50% to 15 million by 2020. Geneva, Switzerland: WHO; 2003.

Lee JW, Lee JB, Park M, Song SH. An extensive comparison of recent classification tools applied to microarray data. Computational Statistics & Data Analysis 2005; 48(4): 869-85.

Hou J, Aerts J, den HB, van IW, den BM, Riegman P, et al. Gene expression-based classification of non-small cell lung carcinomas and survival prediction. PLoS One 2010; 5(4): e10312.

Tibshirani R, Hastie T, Narasimhan B, Chu G. Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci U S A 2002; 99(10): 6567-72.

Tibshirani R, Hastie T, Narasimhan B, Chu G. Class prediction by nearest shrunken centroids, with applications to DNA microarrays. Statistical Science 2003; 18(1):104-17.

Wang L, Zhu J, Zou H. Hybrid huberized support vector machines for microarray classification and gene selection. Bioinformatics 2008; 24(3): 412-9.

Xu P, Brock GN, Parrish RS. Modified linear discriminant analysis approaches for classification of high-dimensional microarray data. Computational Statistics & Data Analysis 2009; 53(5): 1674-87.

Stokowy T. Classification of DNA microarray data with random forests. Advances in Intelligent and Soft Computing 2010; 69, 2010: 305-8.

Soares C, Montgomery L, Rouse K, Gilbert JE. Automating microarray classification using general regression neural networks. Proceedings of 4th International Conference on Machine Learning and Applications; 2008 Dec 11-13; San Diego, CA: IEEE Computer Society; 2008. p. 508-13.

Dabney AR. Classification of microarrays to nearest centroids. Bioinformatics 2005;

(22): 4148-54.

Suarez-Farinas M, Shah KR, Haider AS, Krueger JG, Lowes MA. Personalized medicine in psoriasis: developing a genomic classifier to predict histological response to Alefacept. BMC Dermatol 2010; 10: 1.

Wang S, Zhu J. Improved centroids estimation for the nearest shrunken centroid classifier. Bioinformatics 2007; 23(8): 972-9.

Notterman DA, Alon U, Sierk AJ, Levine AJ.Transcriptional gene expression profiles of colorectal adenoma, adenocarcinoma, and normal tissue examined by oligonucleotide arrays. Cancer Res 2001; 61(7): 3124-30.

Shiga K, Yamamoto H, Okamoto H. Isolation and characterization of the human homologue growth factor-beta. DNA Cell Biol 1992;11(7): 511-22.

Skonier J, Neubauer M, Madisen L, Bennett K, Plowman GD, Purchio AF. cDNA cloning and sequence analysis of beta ig-h3, a novel gene induced in a human adenocarcinoma cell line after treatment with transforming growth factor-beta. DNA Cell Biol 1992;11(7): 511-22.

Ruppert JM, Vogelstein B, Arheden K, Kinzler KW. GLI3 encodes a 190-kilodalton protein with multiple regions of GLI similarity. Mol Cell Biol 1990; 10(10):5408-15.

Ma S, Song X, Huang J. Supervised group Lasso with applications to microarray data analysis. BMC Bioinformatics 2007; 8: 60.

Hsiao JR, Chang KC, Chen CW, Wu SY, Su IJ, Hsu MC, et al. Endoplasmic reticulum stress triggers XBP-1-mediated up- regulation of an EBV oncoprotein in nasopharyngeal carcinoma. Cancer Res 2009; 69(10): 4461-7.

Szala S, Kasai Y, Steplewski Z, Rodeck U, Koprowski H, Linnenbach AJ. Molecular cloning of cDNA for the human tumor- associated antigen CO-029 and identification of related transmembrane antigens. Proc Natl Acad Sci USA 1990; 87(17): 6833-7.

Xiang YY, Wang DY, Tanaka M, Suzuki M, Kiyokawa E, Igarashi H, et al. Expression of high-mobility group-1 mRNA in human gastrointestinal adenocarcinoma and corresponding non-cancerous mucosa. Int J Cancer 1997; 74(1): 1-6.

Jaeger J, Sengupta R, Ruzzo WL. Improved gene selection for classification of microarrays. Pac Symp Biocomput 2003;53-64.

Park MY, Hastie T. Hierarchical classification using shrunken centroids. Stanford. CA: epartment of Statistics,Stanford University; 2005 [Online]. [cited2005]; Available from: URL: http://wwwstatstanfordedu/~hastie/Papers/hpampdf.

Files
IssueVol 1 No 1/2 (2015) QRcode
SectionOriginal Article(s)
Keywords
colon cancer gene expression microarray predictionnearest shrunken centroid

Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
How to Cite
1.
Khoshhali M, Moslemi A, Saidijam M, Poorolajal J, Mahjub H. Predicting the categories of colon cancer using microarray data and nearest shrunken centroid. JBE. 2015;1(1/2):16-21.