Difference between revisions of "Cluster analysis by K-means (analysis)"

From BioUML platform
Jump to: navigation, search
(Automatic synchronization with BioUML)
(Automatic synchronization with BioUML)
 
Line 1: Line 1:
 
;Analysis title
 
;Analysis title
:[[File:Data-Cluster-analysis-by-K-means-icon.png]] Cluster analysis by K-means
+
:[[File:Statistics-Cluster-analysis-by-K-means-icon.png]] Cluster analysis by K-means
 
;Provider
 
;Provider
 
:[[Institute of Systems Biology]]
 
:[[Institute of Systems Biology]]
Line 33: Line 33:
  
 
[[Category:Analyses]]
 
[[Category:Analyses]]
[[Category:Data (analyses group)]]
+
[[Category:Statistics (analyses group)]]
 
[[Category:ISB analyses]]
 
[[Category:ISB analyses]]
 
[[Category:Autogenerated pages]]
 
[[Category:Autogenerated pages]]

Latest revision as of 18:15, 9 December 2020

Analysis title
Statistics-Cluster-analysis-by-K-means-icon.png Cluster analysis by K-means
Provider
Institute of Systems Biology
Class
ClusterAnalysis
Plugin
ru.biosoft.analysis (Common methods of data analysis plug-in)

Contents

[edit] Goal:

Genes are grouped into clusters so that those in one cluster exhibit maximal similarity, whereas those of different clusters are maximally dissimilar.

[edit] Input:

A table of genes or probes with their expression values or fold change calculated. Depending on the algorithm, input of certain parameters is required.

[edit] Output:

A table with the same genes grouped into clusters.

[edit] Parameters:

  • Experiment data - experimental data for analysis.
    • Table - a table with experimental data stored in repository.
    • Columns - the columns from the table which should be taken for the clustering analysis.
  • Cluster algorithm - the version of the K-means algorithm to be applied [1-4].
  • Cluster number - the number of clusters into which the input data will be divided.
  • Output table - name and path in the repository under which the result table will be saved. If a table with the specified name and path already exists, it will be overwritten.

[edit] Further details:

The clustering is done with the K-means algorithm as implemented in the R package (http://www.r-project.org/).

[edit] References:

  1. Forgy, E. W. (1965) Cluster analysis of multivariate data: efficiency vs interpretability of classifications. Biometrics 21, 768�769.
  2. Hartigan, J. A. and Wong, M. A. (1979). A K-means clustering algorithm. Applied Statistics 28, 100�108.
  3. Lloyd, S. P. (1957, 1982) Least squares quantization in PCM. Technical Note, Bell Laboratories. Published in 1982 in IEEE Transactions on Information Theory 28, 128�137.
  4. MacQueen, J. (1967) Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, eds L. M. Le Cam & J. Neyman, 1, pp. 281�297. Berkeley, CA: University of California Press.
Personal tools
Namespaces

Variants
Actions
BioUML platform
Community
Modelling
Analysis & Workflows
Collaborative research
Development
Virtual biology
Wiki
Toolbox