Functional classification (analysis)
- Analysis title
- Functional classification
- Institute of Systems Biology
- biouml.plugins.enrichment (Enrichment analysis plugin)
This analysis allows you to classify set of genes into groups. Several types of classifications may be available for you, for example, "Full gene ontology classification" (includes all groups from GO database) or "GO (biological process)" (includes only groups representing biological process).
For this analysis you have to prepare input table having Ensembl genes as rows. If your data have different row identifiers, consider using "Convert table" analysis first. Only row names are significant for this analysis; columns data is ignored.
- Source data set – Input table having Ensembl genes as rows.
- Species – Species corresponding to the input table.
- Classification – Classification you want to use. List of classifications may differ depending on software version and your subscription. Use 'Repository folder' for custom classification.
- Path to classification root – Specify path to the folder containing classification tables, when 'Repository folder' is selected as classification. Only tables with 'Ensembl gene' type are used for the classification.
- Reference collection – If specified, this collection will be used as list of all Ensembl genes for custom classification. If not specified, list of all Ensembl genes will be created by combining all categories.
- Minimal hits to group – Groups with lower number of hits will be filtered out (nmin)
- Only over-represented (expert) – If checked, under-represented groups will be excluded from the result
- P-value threshold – P-value threshold (Pmax)
- Result name – Name and path for the resulting table
As the result of this analysis you will see the table where each row corresponds to the single group. The following columns are always present in the result:
- ID: Accession number representing given group.
- Number of hits (n): Number of genes or other biological objects from the group matched to some rows in the input set. Only groups for which n ≥ nmin are included into result.
- Group size (m): Total number of genes or other biological objects in the given group. n ≤ m.
- Expected hits (nexp): Number of hits expected in the random input set of given size.
- where N is number of genes from the input set matched to any group in given classification and M is total number of genes which appear in given classification. If n > nexp, then the group is over-represented. If "Only over-represented" option was set, all groups not satisfying this condition are excluded from the result.
- P-value (P): Hypergeometric p-value (cumulative distribution function of hypergeometric distribution) having m, M, n, N as parameters. Only groups for which P ≤ Pmax are included into result.
- Hits: List of Ensembl IDs from the input set matched to the group. Note that number of Ensembl IDs might differ from n as classification internal objects might differ from Ensembl genes. For example, GO uses gene symbols internally, and several Ensembl gene IDs may match to the same gene symbol.
More columns may present for specific classifications (e.g. group description). Column 'Level' if present means minimal number of steps necessary to achieve the root of classification hierarchy (thus higher values mean more specific and smaller groups).