SO SÁNH MỘT SỐ THUẬT TOÁN PHÂN CỤM PHỔ CHO DỮ LIỆU BIỂU DIỄN GENE

Hoàng Thị Thanh Giang; Nguyễn Thị Thúy Hạnh; Nguyễn Hoàng Huy

Comparison of Spectral Clustering Algorithms for Gene Expression Data

Hoang Thi Thanh Giang (*) ¹ , Nguyen Thi Thuy Hanh ¹ , Nguyen Hoang Huy ¹

¹ Khoa Công nghệ Thông tin, Học viện Nông nghiệp Việt Nam

Abstract

Spectral clustering algorithms have been the most effective algorithms to divide genes into groups accordingto the degree of their expression similarity. Such a grouping may suggestthat the respective genes are correlated and/or co-regulated, and subsequently indicatesthat the genes could possibly share a common biological role. In this paper,three spectral clustering algorithms were investigated: unnormalized spectral clustering, normalized spectral clustering according to Shi and Malik (2000), and normalized spectral clustering according to Ng, Jordan and Weiss (2002). The algorithms were benchmarked against each other.The performance of the three clustering algorithms wasstudied on time series expressiondata using Dynamic TimeWarping (DTW) distance in order to measure similarity betweengene expression profiles. Four different cluster validation measures were usedto evaluate the clustering algorithms: Connectivity and Silhouette Index for estimatingthe quality of clusters, Jaccard Index for evaluating the stability of a clustermethod and Rand Index for assessing the accuracy. The results were analyzedby Friedman’s test. The performance of normalized spectral clustering according to Ng, Jordan and Weiss (2002)wasdemonstratedto be the best under the Silhouetteand Rand validation indices.

References

Al - Naymat G., S. Chawla, and J. Teheri (2009). SparseDTW: a novel approach to speed up dynamic time warping. Proc. of the eighth Australasian data mining conference, 101: 117 - 127.

Bayá E. A., and P. M. Granitto (2011). Clustering gene expression data with a penalized graph - based metric. BMC Bioinformatics, 12: 2.

Boeva V., and E. Tsiporkova (2010). A multi - purpose Time series data standardization method. Intelligent systems: from theory to practice, Springer - Varlag Berlin Heidelberg, 299: 445 - 460.

Borg A., N. Lavesson, and V. Boeva (2013). Comparison of clustering approaches for gene expression data. Twelfth scandinavian conference on artificial intelligence. Jaeger M. (ed), 2013.

Chen Y., M. Dong, and M. Rege (2007). Gene expression clustering: a novel graph partitioning approach. International joint conference on neural networks, p. 1542 - 1547.

Chung F. (1997). Spectral graph theory. The CBMS regional conference series in mathematics, Washington.

Datta S. (2003). Comparisons and validation of statistical clustering techniques for microarray gene expression data. Bioinformatics, 19(4): 459 - 466.

Frey B. J., and D. Dueck (2007). Clustering by passing messages between data points. Science, 5814: 972 - 976.

Huang G. T., K. I. Cunningham, P. V. Benos, and C. S. Chennubhotla (2013). Spectral clustering strategies for heterogeneous disease expression data. Pac Symp Biocomput, 2013: 212 - 223.

Handl J., J. Knowles, and D. B. Kell (2005). Computational cluster validation in post - genomic data analysis. Bioinformatics, 21: 3201 - 3212.

Jaccard P. (1912). The distribution of flora in the alpine zone. New Phytologist, 11: 37 - 50.

Jiang D., C. Tang, and A. Zhang (2004). Cluster analysis for gene expression data: a survey. IEEE Transactions on Knowledge and Data Engineering, 16(11): 1370 - 1386.

Luxburg V. U. (2007). A tutorial on spectral clustering. Stat Comput, 17(4): 395 - 416.

Mohar B. (1991). The Laplacian spectrum of graphs. In: Graph theory, combinatorics, and applications, 2, Kalamazoo, MI (1988), p. 871 - 898, New York, Wiley.

Mohar B. (1997). Some applications of Laplace eigenvalues of graphs. In: Graph Symmetry: Algebraic Methods and Applications, Hahn G. and G. Sabidussi (Eds.), NATO ASI Ser. C 497: 225 - 275, Kluwer.

Ng, A., M. Jordan, and Y. Weiss (2002). On spectral clustering: analysis and an algorithm. In: Advances in Neural Information Processing Systems, Dietterich T., S. Becker, and Z. Ghahramani (Eds.), MIT Press, 14: 849 - 856.

Nguyen V. A., and P. Li (2009). Measuring similarity between gene expression profiles: a Bayesian approach. BMC Genomics, 10(3): 1 - 10.

Tsiporkova E., and V. Boeva (2007). Two - pass imputation algorithm for missing value estimation in gene expression time series. Journal of Bioinformatics and Computational Biology, 5(5): 1005 - 1022.

Rousseeuw P. (1987). Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational applied mathematics, 20: 53 - 65.

Rustici G., J. Mata, K. Kivinen, P. Lió, C. J. Penkett, G. Burns, J. Hayles, A. Brazma, P. Nurse and J. Bähler (2004). Periodic gene expression program of the fission yeast cell cycle. Nature genetics, 36(8): 809 - 817.

Quackenbush J. (2001). Computational analysis of microarray data. Nature Reviews Genetics, 2(6): 418 - 427.

Shi, J., and J. Malik (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8): 888 - 905.

Xu Y., V. Olman, and D. Xu (2002). Clustering gene expression data using a graph - theoretic approach: an application of minimum spanning trees. Bioinformatics, 18(4): 536 - 545.

Rand W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66: 846 - 850.

Comparison of Spectral Clustering Algorithms for Gene Expression Data

Received: 22-07-2015

Accepted: 03-09-2015

DOI:

Views

Downloads

Issue: Vol. 13 No. 6 (2015)

Section:

How to Cite:

Comparison of Spectral Clustering Algorithms for Gene Expression Data

Abstract

References