Difference between revisions of "Cluster analysis by K-means (analysis)"

Revision as of 15:24, 4 April 2013

Experiment data - experimental data for analysis.
- Table - a table with experimental data stored in repository.
- Columns - the columns from the table which should be taken for the clustering analysis.
Cluster algorithm - the version of the K-means algorithm to be applied [1-4].
Cluster number - the number of clusters into which the input data will be divided.
Output table - name and path in the repository under which the result table will be saved. If a table with the specified name and path already exists, it will be overwritten.

The clustering is done with the K-means algorithm as implemented in the R package (http://www.r-project.org/).

Forgy, E. W. (1965) Cluster analysis of multivariate data: efficiency vs interpretability of classifications. Biometrics 21, 768–769.
Hartigan, J. A. and Wong, M. A. (1979). A K-means clustering algorithm. Applied Statistics 28, 100–108.
Lloyd, S. P. (1957, 1982) Least squares quantization in PCM. Technical Note, Bell Laboratories. Published in 1982 in IEEE Transactions on Information Theory 28, 128–137.
MacQueen, J. (1967) Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, eds L. M. Le Cam & J. Neyman, 1, pp. 281–297. Berkeley, CA: University of California Press.