MỘT SỐ THUẬT TOÁN PHÂN CỤM DỮ LIỆU ĐỊNH DANH TRONG DATA MINING

Hoàng Thị Hà

MỘT SỐ THUẬT TOÁN PHÂN CỤM DỮ LIỆU ĐỊNH DANH TRONG DATA MINING

Hoàng Thị Hà (*) ¹

¹ Khoa Công nghệ thông tin, Trường Đại học Nông nghiệp Hà Nội

Từ khóa

Dữ liệu định danh, Lý thuyết tập thô, Khai phá dữ liệu, Phân cụm dữ liệu

Tóm tắt

Nghiên cứu một số thuật toán phân cụm điển hình trên dữ liệu định danh nhằm mục đích tổng hợp, phân tích, đánh giá một số thuật toán phân cụm dữ liệu định danh điển hình như K-mode, ROCK, MMR giúp độc giả có cái nhìn trực quan về các thuật toán này để từ đó dễ dàng lựa chọn một thuật toán phù hợp cho bài toán thực tế trong quá trình khai phá dữ liệu. Nghiên cứu tập trung phân tích thuật toán phân cụm dữ liệu định danh MMR dựa trên lý thuyết tập thô. Thuật toán MMR đã được cài đặt thành chương trình máy tính, thử nghiệm và chỉ ra chất lượng phân cụm khá tốt so với các thuật toán phân cụm khác.

Tài liệu tham khảo

Andritsos P. (2002). Data Clusting Techniques.Department of Computer Science, UniversityToronto.

AndritsosP., P. Tsaparas, R. J. Miller, and K.C.Sevcik (2003).Clustering categorical data basedon information loss minimization.2nd Hellenic Data Management Symposium334-344.

Center for Machine Learning and Intelligent Systems(2006).Universityof California, Irvine, http://www.ics.uci.edu/~mlearn/MLRepository.html.

GuhaS., Rajeev Rastogi, Kyueseok Shim (1998). CURE: An Efficient Clustering Algorithm for Large Databases.Published in the Proceedings of the ACM SIGMOD Conference.

GuhaS., Rajeev Rastogi and Kyuseok Shim (2000).ROCK: A robust clustering algorithm forcategorical attributes.Information Systems25 (5) 345-366.

HeZ., X. Xu, and S. Deng(2004).A link clustering based approach for clustering categorical data.Proceedings of the WAIM conferenceavailableathttp://xxx.sf.nchc.org.tw/ftp/cs/papers/0412/0412019.pdf

Huang Z. (1998).Extensions to the K-means Algorithm for Clustering Large Data Sets with Categorical Values.Data Mining and Knowledge Discovery, 2(3), 283-304.

JainA.K, M.N. Murty, P.J. Flyn(1999). Data Clustering: A Review. ACM Computing Surveys, Vol. 31, No3, September.

KimD., K. Lee, and D. Lee(2004).Fuzzy clustering of categorical data using fuzzy centroids. Pattern Recognition Letters 25 (11) 1263 -1271.

ParmarD., Teresa Wu, J. Blackhurst(2007).An algorithm for clustering categorical data using Rough Set Theory.Data & Knowledge Engineering.

MỘT SỐ THUẬT TOÁN PHÂN CỤM DỮ LIỆU ĐỊNH DANH TRONG DATA MINING

Ngày nhận bài: 05-12-2011

Ngày duyệt đăng: 18-05-2012

DOI:

Lượt xem

Download

Số: Tập 10 Số 3 (2012)

Chuyên mục:

Cách trích dẫn: