Some Algorithms for Clustering Categorical Data in Data Mining

Received: 05-12-2011

Accepted: 18-05-2012

DOI:

Views

0

Downloads

0

Section:

KỸ THUẬT VÀ CÔNG NGHỆ

How to Cite:

Ha, H. (2024). Some Algorithms for Clustering Categorical Data in Data Mining. Vietnam Journal of Agricultural Sciences, 10(3), 460–473. http://testtapchi.vnua.edu.vn/index.php/vjasvn/article/view/8

Some Algorithms for Clustering Categorical Data in Data Mining

Hoang Thi Ha (*) 1

  • 1 Khoa Công nghệ thông tin, Trường Đại học Nông nghiệp Hà Nội
  • Keywords

    Cluster analysis, Categorical data, Data mining, Rough set theory

    Abstract


    The paper describes some typical clustering algorithms for categorical attributes. Specifucally, we focus on research clustering algorithm called MMR which is based on the rough set theory. The algorithm has the ability to control uncertainty in processing cluster. It has been successfully installed and four standard data sets were tested. The results indicate that MMR generates better quality clusters than some traditional algorithms.

    References

    Andritsos P. (2002). Data Clusting Techniques.Department of Computer Science, UniversityToronto.

    AndritsosP., P. Tsaparas, R. J. Miller, and K.C.Sevcik (2003).Clustering categorical data basedon information loss minimization.2nd Hellenic Data Management Symposium334-344.

    Center for Machine Learning and Intelligent Systems(2006).Universityof California, Irvine, http://www.ics.uci.edu/~mlearn/MLRepository.html.

    GuhaS., Rajeev Rastogi, Kyueseok Shim (1998). CURE: An Efficient Clustering Algorithm for Large Databases.Published in the Proceedings of the ACM SIGMOD Conference.

    GuhaS., Rajeev Rastogi and Kyuseok Shim (2000).ROCK: A robust clustering algorithm forcategorical attributes.Information Systems25 (5) 345-366.

    HeZ., X. Xu, and S. Deng(2004).A link clustering based approach for clustering categorical data.Proceedings of the WAIM conferenceavailableathttp://xxx.sf.nchc.org.tw/ftp/cs/papers/0412/0412019.pdf

    Huang Z. (1998).Extensions to the K-means Algorithm for Clustering Large Data Sets with Categorical Values.Data Mining and Knowledge Discovery, 2(3), 283-304.

    JainA.K, M.N. Murty, P.J. Flyn(1999). Data Clustering: A Review. ACM Computing Surveys, Vol. 31, No3, September.

    KimD., K. Lee, and D. Lee(2004).Fuzzy clustering of categorical data using fuzzy centroids. Pattern Recognition Letters 25 (11) 1263 -1271.

    ParmarD., Teresa Wu, J. Blackhurst(2007).An algorithm for clustering categorical data using Rough Set Theory.Data & Knowledge Engineering.