Upstream analysis (TRANSFAC(R) and GeneWays) (workflow)

From BioUML platform
Jump to: navigation, search
Workflow title
Upstream analysis (TRANSFAC(R) and GeneWays)
Provider
geneXplain GmbH

Workflow overview

Upstream-analysis-TRANSFAC-R-and-GeneWays-workflow-overview.png

Description

This workflow is designed to perform a complete upstream analysis including a search for putative transcription factor binding sites (TFBSs), in the promoters of the input gene set as well as an analysis of the pathways upstream of the suggested TFs. The resulting master regulatory molecules can be considered as new targets, and are candidates for further experimental validations. 

As input, any gene or protein table can be submitted. The input consists of two tables, one with the genes under study (Yes set or experiment set), and the other with a background set, or No set. 

In the first step, both input tables are converted into the corresponding tables with Ensembl Gene IDs applying the “Convert table” analysis.

In the next step, the promoters of the specified gene sets are searched for TFBSs with the help of the “Site search on gene set” analysis. Promoters in this workflow are by default defined as sequences from -1000 to +100 relative to the transcription start sites, as they are annotated in Ensembl. Promoter regions can be specified differently in the input form.  The site search is performed using the TRANSFAC® library of positional weight matrices, PWMs, with the profile vertebrate_non_redundant_minSUM as default. Any other TRANSFAC® profile or user-specific profile can be chosen in the input form.

In the same step, frequencies of putative TFBS are compared between Yes set and No set to identify sites that are enriched in Yes set versus No set. 

The output of this step is a list of PWMs the hits of which are enriched in Yes set versus No set. 

Next, the list of PWMs is converted into a table of transcription factors with GeneWays IDs using the“Matrices to molecules” analysis. The resulting table is subjected to the “Regulator search” analysis to identify master regulatory molecules in the GeneWays network. For each potential master regulator, FDR, Score, Z-score and Rank sum are calculated.

The results are filtered by Z_Score>1 and Score>0.2 and sorted by the Rank sum column to select statistically significant master regulators.

The table with the resulting master regulatory molecules is converted into the table Ensembl Gene IDs applying the “Convert table” analysis and further annotated with additional information, gene descriptions and gene symbols via the “Annotate table” analysis. 

Finally, networks for the three top master regulatory molecules are visualized as diagrams in the hierarchical layout , via the “Visualize results” method.

Parameters

Input Yes gene set
Species
Input No gene set
Profile
Start of promoter
Position relative to TSS, bp
End of promoter
Position relative to TSS, bp
Results Folder
Folder to store results (will be created if not exists)
Personal tools
Namespaces

Variants
Actions
BioUML platform
Community
Modelling
Analysis & Workflows
Collaborative research
Development
Virtual biology
Wiki
Toolbox