- Analysis title
- CRC Analysis
- Institute of Systems Biology
- ru.biosoft.analysis (Common methods of data analysis plug-in)
Chinese Restaurant Cluster analysis
Genes are grouped into clusters so that those in one cluster exhibit maximal similarity, whereas those of different clusters are maximally dissimilar. Different to the K-means algorithm, the number of clusters will be determined by the program.
A table of genes or probes with their expression values or fold change calculated. Depending on the algorithm, input of certain parameters may be required.
A table with the same genes grouped into clusters.
- Experiment data - experimental data for analysis.
- Table - a table with experimental data stored in repository.
- Columns - the columns from the table which should be taken for the clustering analysis.
- Cluster process number - number of independent clustering processes to be launched. More processes render the results more reliable, but also enhance the required computation.
- Cycles per clustering process - the number of cycles to be executed for each process.
- Allow inversion - whether to consider inverted profiles as similar or not.
- Maximum shift - maximum shift of data allowed to still classify them as similar; most useful in the analysis of time-course data. The shift is measured in profile positions.
- Outline boundaries - lower and upper boundaries for values from the input table. Outliers will be ignored.
- Output table - name and path in the repository under which the result table will be saved. If a table with the specified name and path already exists, it will be overwritten.
The CRC analysis divides an input set of elements (table) into clusters so that the elements (genes, probes) of one cluster have similar profiles, but elements of different clusters do not. Unlike K-means this algorithm finds the optimal number of clusters by itself. It also permits to consider as "similar" not only coexpressed genes but also genes with inverted and shifted profiles. A user-specified number of independent clustering processes is launched, each consisting of a user-defined number of iterations (cycles). Each cycle comprises:
- Choosing a gene
- Removing it from its current cluster
- Finding the best cluster assignment for the gene
- Repeating steps 1 - 3 for all genes in the input table
- Z.S.Qin. (2006) Clustering microarray gene expression data using weighted Chinese restaurant process. Bioinformatics, 22:1988-1997.