Difference between revisions of "Upstream analysis (TRANSFAC(R) and GeneWays) (workflow)"
(Automatic synchronization with BioUML) |
(Automatic synchronization with BioUML) |
||
(One intermediate revision by one user not shown) | |||
Line 5: | Line 5: | ||
== Workflow overview == | == Workflow overview == | ||
[[File:Upstream-analysis-TRANSFAC-R-and-GeneWays-workflow-overview.png|400px]] | [[File:Upstream-analysis-TRANSFAC-R-and-GeneWays-workflow-overview.png|400px]] | ||
+ | == Description == | ||
+ | This workflow is designed to perform a complete upstream analysis including a search for putative transcription factor binding sites (TFBSs), in the promoters of the input gene set as well as an analysis of the pathways upstream of the suggested TFs. The resulting master regulatory molecules can be considered as new targets, and are candidates for further experimental validations. | ||
+ | |||
+ | As input, any gene or protein table can be submitted. The input consists of two tables, one with the genes under study (Yes set or experiment set), and the other with a background set, or No set. | ||
+ | |||
+ | In the first step, both input tables are converted into the corresponding tables with Ensembl Gene IDs applying the “Convert table” analysis. | ||
+ | |||
+ | In the next step, the promoters of the specified gene sets are searched for TFBSs with the help of the “Site search on gene set” analysis. Promoters in this workflow are by default defined as sequences from -1000 to +100 relative to the transcription start sites, as they are annotated in Ensembl. Promoter regions can be specified differently in the input form. The site search is performed using the TRANSFAC® library of positional weight matrices, PWMs, with the profile vertebrate_non_redundant_minSUM as default. Any other TRANSFAC® profile or user-specific profile can be chosen in the input form. | ||
+ | |||
+ | In the same step, frequencies of putative TFBS are compared between Yes set and No set to identify sites that are enriched in Yes set versus No set. | ||
+ | |||
+ | The output of this step is a list of PWMs the hits of which are enriched in Yes set versus No set. | ||
+ | |||
+ | Next, the list of PWMs is converted into a table of transcription factors with GeneWays IDs using the“Matrices to molecules” analysis. The resulting table is subjected to the “Regulator search” analysis to identify master regulatory molecules in the GeneWays network. For each potential master regulator, FDR, Score, Z-score and Rank sum are calculated. | ||
+ | |||
+ | The results are filtered by Z_Score>1 and Score>0.2 and sorted by the Rank sum column to select statistically significant master regulators. | ||
+ | |||
+ | The table with the resulting master regulatory molecules is converted into the table Ensembl Gene IDs applying the “Convert table” analysis and further annotated with additional information, gene descriptions and gene symbols via the “Annotate table” analysis. | ||
+ | |||
+ | Finally, networks for the three top master regulatory molecules are visualized as diagrams in the hierarchical layout , via the “Visualize results” method. | ||
+ | |||
== Parameters == | == Parameters == | ||
;Input Yes gene set | ;Input Yes gene set | ||
Line 10: | Line 31: | ||
;Input No gene set | ;Input No gene set | ||
;Profile | ;Profile | ||
− | |||
− | |||
;Start of promoter | ;Start of promoter | ||
+ | :Position relative to TSS, bp | ||
+ | ;End of promoter | ||
:Position relative to TSS, bp | :Position relative to TSS, bp | ||
;Results Folder | ;Results Folder | ||
Line 18: | Line 39: | ||
[[Category:Workflows]] | [[Category:Workflows]] | ||
− | |||
[[Category:GeneXplain workflows]] | [[Category:GeneXplain workflows]] | ||
+ | [[Category:Autogenerated pages]] |
Latest revision as of 13:34, 30 May 2013
- Workflow title
- Upstream analysis (TRANSFAC(R) and GeneWays)
- Provider
- geneXplain GmbH
[edit] Workflow overview
[edit] Description
This workflow is designed to perform a complete upstream analysis including a search for putative transcription factor binding sites (TFBSs), in the promoters of the input gene set as well as an analysis of the pathways upstream of the suggested TFs. The resulting master regulatory molecules can be considered as new targets, and are candidates for further experimental validations.
As input, any gene or protein table can be submitted. The input consists of two tables, one with the genes under study (Yes set or experiment set), and the other with a background set, or No set.
In the first step, both input tables are converted into the corresponding tables with Ensembl Gene IDs applying the “Convert table” analysis.
In the next step, the promoters of the specified gene sets are searched for TFBSs with the help of the “Site search on gene set” analysis. Promoters in this workflow are by default defined as sequences from -1000 to +100 relative to the transcription start sites, as they are annotated in Ensembl. Promoter regions can be specified differently in the input form. The site search is performed using the TRANSFAC® library of positional weight matrices, PWMs, with the profile vertebrate_non_redundant_minSUM as default. Any other TRANSFAC® profile or user-specific profile can be chosen in the input form.
In the same step, frequencies of putative TFBS are compared between Yes set and No set to identify sites that are enriched in Yes set versus No set.
The output of this step is a list of PWMs the hits of which are enriched in Yes set versus No set.
Next, the list of PWMs is converted into a table of transcription factors with GeneWays IDs using the“Matrices to molecules” analysis. The resulting table is subjected to the “Regulator search” analysis to identify master regulatory molecules in the GeneWays network. For each potential master regulator, FDR, Score, Z-score and Rank sum are calculated.
The results are filtered by Z_Score>1 and Score>0.2 and sorted by the Rank sum column to select statistically significant master regulators.
The table with the resulting master regulatory molecules is converted into the table Ensembl Gene IDs applying the “Convert table” analysis and further annotated with additional information, gene descriptions and gene symbols via the “Annotate table” analysis.
Finally, networks for the three top master regulatory molecules are visualized as diagrams in the hierarchical layout , via the “Visualize results” method.
[edit] Parameters
- Input Yes gene set
- Species
- Input No gene set
- Profile
- Start of promoter
- Position relative to TSS, bp
- End of promoter
- Position relative to TSS, bp
- Results Folder
- Folder to store results (will be created if not exists)