Difference between revisions of "Functional classification (analysis)"

Revision as of 10:40, 23 April 2013

Analysis title: Functional classification
Provider: Institute of Systems Biology

Description

This analysis allows you to classify set of genes into groups. Several types of classifications may be available for you, for example, "Full gene ontology classification" (includes all groups from GO database) or "GO (biological process)" (includes only groups representing biological process).

For this analysis you have to prepare input table having Ensembl genes as rows. If your data have different row identifiers, consider using "Convert table" analysis first. Only row names are significant for this analysis; columns data is ignored.

Parameters:

Source data set – Input table having Ensembl genes as rows.
Species – Species corresponding to the input table.
Classification – Classification you want to use. List of classifications may differ depending on software version and your subscription. Use 'Repository folder' for custom classification.
Path to classification root – Specify path to the folder containing classification tables, when 'repository folder' is selected as classification. Only tables with 'Ensembl gene' type are used for the classification.
Minimal hits to group – Groups with lower number of hits will be filtered out (n_min)
Only over-represented (expert) – If checked, under-represented groups will be excluded from the result
P-value threshold – P-value threshold (P_max)
Result name – Name and path for the resulting table

Result

As the result of this analysis you will see the table where each row corresponds to the single group. The following columns are always present in the result:

ID: Accession number representing given group.
Number of hits (n): Number of genes or other biological objects from the group matched to some rows in the input set. Only groups for which n ≥ n_min are included into result.
Group size (m): Total number of genes or other biological objects in the given group. n ≤ m.
Expected hits (n_exp): Number of hits expected in the random input set of given size.

where N is number of genes from the input set matched to any group in given classification and M is total number of genes which appear in given classification. If n > n_exp, then the group is over-represented. If "Only over-represented" option was set, all groups not satisfying this condition are excluded from the result.
P-value (P): Hypergeometric p-value (cumulative distribution function of hypergeometric distribution) having m, M, n, N as parameters. Only groups for which P ≤ P_max are included into result.
Hits: List of Ensembl IDs from the input set matched to the group. Note that number of Ensembl IDs might differ from n as classification internal objects might differ from Ensembl genes. For example, GO uses gene symbols internally, and several Ensembl gene IDs may match to the same gene symbol.

More columns may present for specific classifications (e.g. group description). Column 'Level' if present means minimal number of steps necessary to achieve the root of classification hierarchy (thus higher values mean more specific and smaller groups).

@@ Line 7: / Line 7: @@
 This analysis allows you to classify set of genes into groups. Several types of classifications may be available for you, for example, "Full gene ontology classification" (includes all groups from GO database) or "GO (biological process)" (includes only groups representing biological process).
-==== Parameters ====
+For this analysis you have to prepare input table having Ensembl genes as rows. If your data have different row identifiers, consider using "Convert table" analysis first. Only row names are significant for this analysis; columns data is ignored.
-* '''Source data set''': input table having Ensembl genes as rows. If your data have different row identifiers, consider using "Convert table" analysis first. Only row names are significant for this analysis; columns data is ignored.
+==== Parameters: ====
-* '''Species''': species corresponding to the input table.
-* '''Classification''': classification you want to use. List of classifications may differ depending on software version and your subscription.
+* '''Source data set''' – Input table having Ensembl genes as rows.
-* '''Minimal hits to group''' (''n<sub>min</sub>''): minimal number of hits in the group to be included into result.
+* '''Species''' – Species corresponding to the input table.
-* '''Only over-represented''': if checked, under-represented groups will be excluded from the result.
+* '''Classification''' – Classification you want to use. List of classifications may differ depending on software version and your subscription. Use <nowiki>'</nowiki>Repository folder<nowiki>'</nowiki> for custom classification.
-* '''P-value threshold''' (''P<sub>max</sub>''): threshold for hypergeometric p-value.
+* '''Path to classification root''' – Specify path to the folder containing classification tables, when <nowiki>'</nowiki>repository folder<nowiki>'</nowiki> is selected as classification. Only tables with <nowiki>'</nowiki>Ensembl gene<nowiki>'</nowiki> type are used for the classification.
-* '''Result name''': name and path of the output table.
+* '''Minimal hits to group''' – Groups with lower number of hits will be filtered out (n<sub>min</sub>)
+* '''Only over-represented''' (expert) – If checked, under-represented groups will be excluded from the result
+* '''P-value threshold''' – P-value threshold (P<sub>max</sub>)
+* '''Result name''' – Name and path for the resulting table
 ==== Result ====

Difference between revisions of "Functional classification (analysis)"

Revision as of 10:40, 23 April 2013

Description

Parameters:

Result

Personal tools

Namespaces

Variants

Views

Actions

Search

BioUML platform

Community

Modelling

Analysis & Workflows

Collaborative research

Development

Virtual biology

Wiki

Toolbox