Difference between revisions of "Analyze any DNA sequence for site enrichment (GTRD) (workflow)"

From BioUML platform
Jump to: navigation, search
(Automatic synchronization with BioUML)
 
(Automatic synchronization with BioUML)
 
(One intermediate revision by one user not shown)
Line 6: Line 6:
 
[[File:Analyze-any-DNA-sequence-for-site-enrichment-GTRD-workflow-overview.png|400px]]
 
[[File:Analyze-any-DNA-sequence-for-site-enrichment-GTRD-workflow-overview.png|400px]]
 
== Description ==
 
== Description ==
This workflow is designed to search for putative transcription factor binding sites, TFBS, on the promoters of an input gene set.
+
This workflow is designed to search for enriched transcription factor binding sites, TFBSs, in any input DNA sequence as compared to a background DNA sequence. With this workflow you can analyze sequences of any species and any genomic region.  
  
As input, any gene or protein table can be submitted. The input table contains genes under study, and it is called “Yes” set.
+
The input Yes and No sequences are subjected to ‘Site search on track’ method, using the default profile from '''GTRD database''' '''‘moderate threshold'''’.   The results of the site search on track are then subjected to ‘Site search result optimization’. 
  
At the first step, the input table is converted into a table with Ensembl Gene IDs.
+
The results folder consists of several tables and tracks. Summary table gives the TFBSs enriched in the Yes set as compared with the No set. The optimized tracks present those TFBSs that are over-represented in the Yes sequences versus the No sequences. Scores of the putative sites are optimized by the algorithm.
  
At the next step, promoters are analyzed for potential cis-regulatory sites. Promoters in this workflow are defined as sequences from -1000 to +100 relative to the transcription start sites, as they are annotated in Ensembl.
+
Transcription factors table aim at showing transcription factors linked to the identified site models (matrices). These are potential candidate regulators of genes in the input Yes set. They are supposed to regulate transcription of Yes-genes via the identified enriched TFBSs.
  
Site search is done with the help of the TRANSFAC® library of the  positional weight matrices, PWMs, namely with the profile vertebrate_non_redundant_minSUM.
+
 
  
At the same step, frequencies of putative TFBSs are compared between Yes set and No set to identify sites overrepresented in Yes set versus No set. Default No set in the workflow is a set of housekeeping genes for the corresponding species (as they are published in PMID: 19534766) 
+
 
 
+
The result of this step is a list of PWMs the hits of which are overrepresented in Yes set versus No set.
+
 
+
Next, the list of PWMs is converted into a table of transcription factors. Two tables are produced, with Ensembl Gene IDs and with Entrez IDs.
+
 
+
Finally,  both tables with transcription factors are annotated with additional information, gene description and gene symbols.
+
 
+
The output is a new folder with several tables, including a summary of the predicted TFBSs, genomic tracks of the Yes and No promoters  and sites, as well as the tables with transcription factors potentially regulating the genes in the Yes set.
+
 
+
This workflow is available together with a valid TRANSFAC® license.
+
  
 
== Parameters ==
 
== Parameters ==

Latest revision as of 16:34, 12 March 2019

Workflow title
Analyze any DNA sequence for site enrichment (GTRD)
Provider
geneXplain GmbH

[edit] Workflow overview

Analyze-any-DNA-sequence-for-site-enrichment-GTRD-workflow-overview.png

[edit] Description

This workflow is designed to search for enriched transcription factor binding sites, TFBSs, in any input DNA sequence as compared to a background DNA sequence. With this workflow you can analyze sequences of any species and any genomic region.

The input Yes and No sequences are subjected to ‘Site search on track’ method, using the default profile from GTRD database ‘moderate threshold’.   The results of the site search on track are then subjected to ‘Site search result optimization’. 

The results folder consists of several tables and tracks. Summary table gives the TFBSs enriched in the Yes set as compared with the No set. The optimized tracks present those TFBSs that are over-represented in the Yes sequences versus the No sequences. Scores of the putative sites are optimized by the algorithm.

Transcription factors table aim at showing transcription factors linked to the identified site models (matrices). These are potential candidate regulators of genes in the input Yes set. They are supposed to regulate transcription of Yes-genes via the identified enriched TFBSs.

 

 

[edit] Parameters

Input Yes sequence set
Select Yes sequence set
Input No sequence set
Select No sequence set
Profile
Select Profile
Results folder
Select Results folder
Personal tools
Namespaces

Variants
Actions
BioUML platform
Community
Modelling
Analysis & Workflows
Collaborative research
Development
Virtual biology
Wiki
Toolbox