Difference between revisions of "Upstream analysis (TRANSFAC(R) and TRANSPATH(R)) (workflow)"

From BioUML platform
Jump to: navigation, search
(GeneXplain -> geneXplain)
(Automatic synchronization with BioUML)
 
(4 intermediate revisions by one user not shown)
Line 6: Line 6:
 
[[File:Upstream-analysis-TRANSFAC-R-and-TRANSPATH-R-workflow-overview.png|400px]]
 
[[File:Upstream-analysis-TRANSFAC-R-and-TRANSPATH-R-workflow-overview.png|400px]]
 
== Description ==
 
== Description ==
This workflow is designed to perform a complete upstream analysis including search for putative transcription factor binding sites, TFBSs, on the promoters of the input gene set as well as an analysis of the pathways upstream of the suggested TFs. Resulting master regulatory molecules can be considered as new targets, and are candidates for further experimental validations.
+
This workflow is designed to perform a complete upstream analysis including a search for putative transcription factor binding sites (TFBSs), in the promoters of the input gene set as well as an analysis of the pathways upstream of the suggested TFs. The resulting master regulatory molecules can be considered as new targets, and are candidates for further experimental validations
  
As input, any gene or protein table can be submitted. The input is a table with the genes under study (“Yes” set), and a background set, or No set.
+
As input, any gene or protein table can be submitted. The input is a table with the genes under study (“Yes” set), and a background set, or No set. 
  
 
At the first step, both input tables are converted into the corresponding tables with Ensembl Gene IDs.
 
At the first step, both input tables are converted into the corresponding tables with Ensembl Gene IDs.
  
At the next step, TFBSs are search in the promoters of the specified gene sets. Promoters in this workflow are defined as sequences from -1000 to +100 relative to the transcription start sites, as they are annotated in Ensembl.
+
At the next step, TFBSs are search in the promoters of the specified gene sets. Promoters in this workflow are defined as sequences from -1000 to +100 relative to the transcription start sites, as they are annotated in Ensembl. 
  
Site search is done with the TRANSFAC® library of positional weight matrices, PWMs, namely with the profile vertebrate_non_redundant_minSUM.
+
Site search is done with the TRANSFAC® library of positional weight matrices, PWMs, namely with the profile vertebrate_non_redundant_minSUM. 
  
At the same step, frequencies of putative TFBS are compared between Yes set and No set to identify sites that are overrepresented in Yes set versus No set.
+
At the same step, frequencies of putative TFBS are compared between Yes set and No set to identify sites that are overrepresented in Yes set versus No set. 
  
 
The output of this step is a list of PWMs the hits of which are overrepresented in Yes set versus No set.
 
The output of this step is a list of PWMs the hits of which are overrepresented in Yes set versus No set.
Line 30: Line 30:
 
The output is a new folder with several tables, including summary of the predicted TFBS, genomic tracks of the Yes and No promoters and sites, as well as a table with candidate master regulators and network diagrams for three top candidates.
 
The output is a new folder with several tables, including summary of the predicted TFBS, genomic tracks of the Yes and No promoters and sites, as well as a table with candidate master regulators and network diagrams for three top candidates.
  
This workflow is available together with valid TRANSFAC® and TRANSPATH® licenses.
+
This workflow is available together with valid TRANSFAC® and TRANSPATH® licenses..
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
  
 
== Parameters ==
 
== Parameters ==
Line 37: Line 43:
 
;Input No gene set
 
;Input No gene set
 
;Profile
 
;Profile
;Start of promotor
+
;Start of promoter
 
:Position relative to TSS, bp
 
:Position relative to TSS, bp
;End of promotor
+
;End of promoter
 
:Position relative to TSS, bp
 
:Position relative to TSS, bp
 
;Results Folder
 
;Results Folder
Line 45: Line 51:
  
 
[[Category:Workflows]]
 
[[Category:Workflows]]
 +
[[Category:GeneXplain workflows]]
 
[[Category:Autogenerated pages]]
 
[[Category:Autogenerated pages]]

Latest revision as of 13:34, 30 May 2013

Workflow title
Upstream analysis (TRANSFAC(R) and TRANSPATH(R))
Provider
geneXplain GmbH

[edit] Workflow overview

Upstream-analysis-TRANSFAC-R-and-TRANSPATH-R-workflow-overview.png

[edit] Description

This workflow is designed to perform a complete upstream analysis including a search for putative transcription factor binding sites (TFBSs), in the promoters of the input gene set as well as an analysis of the pathways upstream of the suggested TFs. The resulting master regulatory molecules can be considered as new targets, and are candidates for further experimental validations

As input, any gene or protein table can be submitted. The input is a table with the genes under study (“Yes” set), and a background set, or No set. 

At the first step, both input tables are converted into the corresponding tables with Ensembl Gene IDs.

At the next step, TFBSs are search in the promoters of the specified gene sets. Promoters in this workflow are defined as sequences from -1000 to +100 relative to the transcription start sites, as they are annotated in Ensembl. 

Site search is done with the TRANSFAC® library of positional weight matrices, PWMs, namely with the profile vertebrate_non_redundant_minSUM. 

At the same step, frequencies of putative TFBS are compared between Yes set and No set to identify sites that are overrepresented in Yes set versus No set. 

The output of this step is a list of PWMs the hits of which are overrepresented in Yes set versus No set.

Next, the list of PWMs is converted into a table of transcription factors with TRANSPATH® IDs, which are used to search for master regulatory molecules in the TRANSPATH® network. For each potential master regulator, FDR, Score, and Z-score are calculated.

The results are filtered by Z_Score>1 and Score>0.2 to select statistically significant master regulators.

The table with the resulting master regulatory molecules is converted into the table Ensembl Gene IDs and annotated with additional information, gene description and gene symbols.

Finally, networks for the three top master regulatory molecules are visualized as diagrams in the hierarchical layout.

The output is a new folder with several tables, including summary of the predicted TFBS, genomic tracks of the Yes and No promoters and sites, as well as a table with candidate master regulators and network diagrams for three top candidates.

This workflow is available together with valid TRANSFAC® and TRANSPATH® licenses..

 

 

 

[edit] Parameters

Input Yes gene set
Species
Input No gene set
Profile
Start of promoter
Position relative to TSS, bp
End of promoter
Position relative to TSS, bp
Results Folder
Folder to store results (will be created if not exists)
Personal tools
Namespaces

Variants
Actions
BioUML platform
Community
Modelling
Analysis & Workflows
Collaborative research
Development
Virtual biology
Wiki
Toolbox