Received: 05-12-2011
Accepted: 18-05-2012
DOI:
Views
Downloads
How to Cite:
Some Algorithms for Clustering Categorical Data in Data Mining
Keywords
Cluster analysis, Categorical data, Data mining, Rough set theory
Abstract
The paper describes some typical clustering algorithms for categorical attributes. Specifucally, we focus on research clustering algorithm called MMR which is based on the rough set theory. The algorithm has the ability to control uncertainty in processing cluster. It has been successfully installed and four standard data sets were tested. The results indicate that MMR generates better quality clusters than some traditional algorithms.
References
Andritsos P. (2002). Data Clusting Techniques.Department of Computer Science, UniversityToronto.
AndritsosP., P. Tsaparas, R. J. Miller, and K.C.Sevcik (2003).Clustering categorical data basedon information loss minimization.2nd Hellenic Data Management Symposium334-344.
Center for Machine Learning and Intelligent Systems(2006).Universityof California, Irvine, http://www.ics.uci.edu/~mlearn/MLRepository.html.
GuhaS., Rajeev Rastogi, Kyueseok Shim (1998). CURE: An Efficient Clustering Algorithm for Large Databases.Published in the Proceedings of the ACM SIGMOD Conference.
GuhaS., Rajeev Rastogi and Kyuseok Shim (2000).ROCK: A robust clustering algorithm forcategorical attributes.Information Systems25 (5) 345-366.
HeZ., X. Xu, and S. Deng(2004).A link clustering based approach for clustering categorical data.Proceedings of the WAIM conferenceavailableathttp://xxx.sf.nchc.org.tw/ftp/cs/papers/0412/0412019.pdf
Huang Z. (1998).Extensions to the K-means Algorithm for Clustering Large Data Sets with Categorical Values.Data Mining and Knowledge Discovery, 2(3), 283-304.
JainA.K, M.N. Murty, P.J. Flyn(1999). Data Clustering: A Review. ACM Computing Surveys, Vol. 31, No3, September.
KimD., K. Lee, and D. Lee(2004).Fuzzy clustering of categorical data using fuzzy centroids. Pattern Recognition Letters 25 (11) 1263 -1271.
ParmarD., Teresa Wu, J. Blackhurst(2007).An algorithm for clustering categorical data using Rough Set Theory.Data & Knowledge Engineering.