Original Article

Naïve Bayes evidence accumulation K-modes clustering: A new method for classifying binary data and its application on real data of injecting drug users

Abstract

Background & Aim: Clustering is the method of classifying discrete data such as Kmodes, and Naïve Bayes classifier is the classification to predict the unknown real classes. In this research, we improve the K-modes results by applying the Evidence Accumulation (EA) method to keep the initial mode vector to use in the Naïve Bayes EA K-Mode.
Methods & Materials: The methods are applied to four real datasets, which the true classes are specified, for checking the external validity and purity of our methods. The free programming software R with package klaR for K-modes, EA, and package e1071 for Naïve Bayes is used. In addition, the methods are applied to the data of Injecting Drug Users (IDU) national dataset with sample size 2546.
Results: The EA K-modes algorithm applied to five real datasets then with the kept initial mode vector, rerun the K-modes. The results indicate the purity in the EA K-modes (0.544, 0.862, 0.914, 0.944, 0.625) has significant different with classic K-modes (0.497, 0.610, 0.404, 0.650, 0.625). Finally, we applied the Naïve Bayes classifier with prior probability finds in EA K-modes. For K=2 Naïve Bayes EA K-modes made better clustering (0.71, 0.873 against 0.625, 0.862 EA k-mode and 0.497, 0.61 K-mode).
Conclusion: In this paper, we proposed Naïve Bayes EA K-modes as a new method for clustering of binary data. Our new method leads to stable clustering compare with the previous studies. The Naïve Bayes EA K-modes method improves the purity and establishes a better separation.

References:

Guha S, Rastogi R, Shim K, editors. CURE: an efficient clustering algorithm for large databases. ACM Sigmod Record; 1998: ACM.

Berkhin P. A survey of clustering data mining techniques. Grouping multidimensional data: Springer; 2006. p. 25-71.

Han J, Pei J, Kamber M. Data mining: concepts and techniques: Elsevier; 2011.

Rencher AC. Methods of multivariate analysis: John Wiley & Sons; 2003.

Huang Z, editor A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining. DMKD; 1997.

Khan SS, Kant S, editors. Computation of Initial Modes for K-modes Clustering Algorithm Using Evidence Accumulation. IJCAI; 2007.

Aranganayagi S, Thangavel K. Clustering categorical data using bayesian concept. International Journal of Computer Theory and Engineering. 2009;1(2):119.

Files
IssueVol 4 No 2 (2018) QRcode
SectionOriginal Article(s)
Keywords
clustering K-modes Evidence Accumulation Naïve Bayes classifier discrete

Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
How to Cite
1.
Zamaninasab Z, Sharifi H, Bahrampour A. Naïve Bayes evidence accumulation K-modes clustering: A new method for classifying binary data and its application on real data of injecting drug users. JBE. 2018;4(2):72-78.