Upstream analysis with feedback loop (TRANSFAC(R) and TRANSPATH(R)) (workflow)
- Workflow title
- Upstream analysis with feedback loop (TRANSFAC(R) and TRANSPATH(R))
- Provider
- geneXplain GmbH
Workflow overview
Description
This workflow is designed to perform a complete upstream analysis including a search for putative transcription factor binding sites (TFBSs), in the promoters of the input gene set as well as an analysis of the pathways upstream of the suggested TFs. The resulting master regulatory molecules can be considered as new targets, and are candidates for further experimental validations
As input, any gene or protein table can be submitted. The input is a table with the genes under study (“Yes” set), and a background set, or No set.
At the first step, both input tables are converted into the corresponding tables with Ensembl Gene IDs.
At the next step, TFBSs are search in the promoters of the specified gene sets. Promoters in this workflow are defined as sequences from -1000 to +100 relative to the transcription start sites, as they are annotated in Ensembl.
Site search is done with the TRANSFAC® library of positional weight matrices, PWMs, namely with the profile vertebrate_non_redundant_minSUM.
At the same step, frequencies of putative TFBS are compared between Yes set and No set to identify sites that are overrepresented in Yes set versus No set.
The output of this step is a list of PWMs the hits of which are overrepresented in Yes set versus No set.
Next, the list of PWMs is converted into a table of transcription factors with TRANSPATH® IDs, which are used to search for master regulatory molecules in the TRANSPATH® network. For each potential master regulator, FDR, Score, and Z-score are calculated.
The results are filtered by Z_Score>1 and Score>0.2 to select statistically significant master regulators.
The table with the resulting master regulatory molecules is converted into the table Ensembl Gene IDs and annotated with additional information, gene description and gene symbols.
Finally, networks for the three top master regulatory molecules are visualized as diagrams in the hierarchical layout.
The output is a new folder with several tables, including summary of the predicted TFBS, genomic tracks of the Yes and No promoters and sites, as well as a table with candidate master regulators and network diagrams for three top candidates.
This workflow is available together with valid TRANSFAC® and TRANSPATH® licenses..
Parameters
- Input Yes gene set
- Expression
- Species
- Input No gene set
- Profile
- Start of promoter
- Position relative to TSS, bp
- End of promoter
- Position relative to TSS, bp
- Results Folder
- Folder to store results (will be created if not exists)