Naïve Bayes Evidence Accumulation K-modes Clustering: A New Method for Classifying Binary Data and its application on real data of injecting drug users

  • Zahra Zamaninasab HIV/STI Surveillance Research Center, and WHO Collaborating Center for HIV Surveillance, Institute for Futures Studies in Health, Department of Biostatistics and Epidemiology, School of Public Health, Kerman university of Medical Sciences, Kerman, Iran
  • Hamid Sharifi HIV/STI Surveillance Research Center, and WHO Collaborating Center for HIV Surveillance, Institute for Futures Studies in Health, Kerman University of Medical Sciences, Kerman, Iran AND Department of Biostatistics and Epidemiology, Faculty of Public Health, Kerman University of Medical Sciences, Kerman, Iran
  • Abbas Bahrampour Modelling in Health Research Center, Institute for Futures Studies in Health, Department of Biostatistics and Epidemiology, Health Faculty, Kerman University of Medical Sciences, Kerman, Iran
Keywords: clustering, K-modes, Evidence Accumulation, Naïve Bayes classifier, discrete

Abstract

Background: Clustering is the method of classifying discrete data such as K-modes, and Naïve Bayes classifier is the classification to predict the unknown real classes. In this research, we improve the K-modes results by applying the Evidence Accumulation (EA) method to keep the initial mode vector to use in the Naïve Bayes EA K-Mode. Method: The methods are applied to four real datasets, which the true classes are specified, for checking the external validity and purity of our methods. The free programming software R with package klaR for K-modes, EA, and package e1071 for Naïve Bayes is used. In addition, the methods are applied to the data of Injecting Drug Users (IDU) national dataset with sample size 2546. Results: The EA K-modes algorithm applied to five real datasets then with the kept initial mode vector, rerun the K-modes. The results indicate the purity in the EA K-modes (0.544, 0.862, 0.914, 0.944, 0.625) has significant different with classic K-modes (0.497, 0.610, 0.404, 0.650, 0.625).  Finally, we applied the Naïve Bayes classifier with prior probability finds in EA K-modes. For K=2 Naïve Bayes EA K-modes made better clustering (0.71, 0.873 against 0.625, 0.862 EA k-mode and 0.497, 0.61 K-mode).   Discussion and Conclusion: In this paper, we proposed Naïve Bayes EA K-modes as a new method for clustering of binary data. Our new method leads to stable clustering compare with the previous studies. The Naïve Bayes EA K-modes method improves the purity and establishes a better separation. Keywords: clustering, K-modes, Evidence Accumulation, Naïve Bayes classifier, discrete      

References

References:

Guha S, Rastogi R, Shim K, editors. CURE: an efficient clustering algorithm for large databases. ACM Sigmod Record; 1998: ACM.

Berkhin P. A survey of clustering data mining techniques. Grouping multidimensional data: Springer; 2006. p. 25-71.

Han J, Pei J, Kamber M. Data mining: concepts and techniques: Elsevier; 2011.

Rencher AC. Methods of multivariate analysis: John Wiley & Sons; 2003.

Huang Z, editor A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining. DMKD; 1997.

Khan SS, Kant S, editors. Computation of Initial Modes for K-modes Clustering Algorithm Using Evidence Accumulation. IJCAI; 2007.

Aranganayagi S, Thangavel K. Clustering categorical data using bayesian concept. International Journal of Computer Theory and Engineering. 2009;1(2):119.

Published
2018-10-31
How to Cite
1.
Zamaninasab Z, Sharifi H, Bahrampour A. Naïve Bayes Evidence Accumulation K-modes Clustering: A New Method for Classifying Binary Data and its application on real data of injecting drug users. jbe. 4(2):26-2.
Section
Original Article(s)