Search for enriched TFBSs (tracks) (analysis)

From BioUML platform
Jump to: navigation, search
Analysis title
BSA-Search-for-enriched-TFBSs-(tracks)-icon.png Search for enriched TFBSs (tracks)
Provider
geneXplain GmbH
Class
EnrichedTFBSFinderTx
Plugin
com.genexplain.analyses (geneXplain analyses)

Contents

Search for enriched TFBSs in a track

Parameters

  • Yes set - Study track / Track with intervals of interest
  • No set - Background track / Track of non-bound intervals
  • Sequence source - Choose a deployed sequence source from the pull-down list. Selecting Custom enables setting of a custom sequence collection
  • Sequence collection - Resource / Folder with sequences containing Yes and No intervals
  • Input motif profile - Profile of weight matrices
  • Output path - Path in workspace to store output table
  • Initial cut-off - Score cut-off to initiate search for optimal threshold given in [frequency of predicted sites per base]
  • Analyze multiple No sets - Analyze the specified number and size of No sets to sampled
  • Number of samples - Number of No set samples
  • Sample size - Number of No set sequences to sample
  • Site enrichment cutoff - Threshold for enrichment of sites in Yes set
  • Site FDR cutoff - Threshold for FDR of site enrichment in Yes set
  • Sequence enrichment cutoff - Threshold for enrichment of Yes sequences with sites in Yes set
  • Sequence FDR cutoff - Threshold for FDR of Yes sequences with sites in Yes set

Output

The output contains the columns described below. Columns highlighted in bold are shown in the default view. The other columns can be included on demand via the Columns tab of the lower right panel (available with opened output table).

Adj. site FE
Adjusted fold enrichment of sites in Yes set
Site FDR
FDR of site enrichment (Benjamini-Hochberg method)
Adj. seq FE
Adjusted fold enrichment of site containing Yes sequences
Seq FDR
FDR of sequence enrichment (Benjamini-Hochberg method)
#Yes sites per 1K
Number of sites per 1000 scanned windows in Yes set
#No sites per 1K
Number of sites per 1000 scanned windows in No set
Site P-value
P-value of site enrichment (binomial test)
Site cutoff
Score cut-off with best site enrichment
 %Yes seq
Percent Yes sequences with at least one site
 %No seq
Percent No sequences with at least one site
Seq P-value
P-value of sequence enrichment (Fisher test)
Seq cutoff
Score cut-off with best sequence enrichment

Description

This method searches for enriched transcription factor binding sites given a set of described Position-specific Frequency Matrices (PFMs), e.g. as collected in Transfac(R).

Fold enrichment of sites (Site FE) as well as of sequences with at least one site (Seq FE) are optimized and reported as statistically corrected odds ratios (99% confidence interval). The reported values correct for small site or sequence numbers, taking into account possible variability, and are therefore more suitable for ranking PFMs by their fold enrichment in Yes promoters.

Furthermore, the algorithm seeks optimal score thresholds for each type of enrichment separately and reports False Discovery Rates (FDRs) in addition to uncorrected P-values.

An initial (low, permissive) score threshold for optimization is estimated using sequences in the No set. The threshold is specified as a single parameter, the frequency of sites per basepair, (see Expert options), thereby omitting the necessity to compile a PFM profile.

To ensure smooth performance, the routine imposes some limits on the input. Yes and No sequence sets must comprise at most 10 million bases and PFMs are expected to comprise at least 4 positions. Finally, the initial frequency cut-off should have 10-fold support by the No set, e.g. setting a threshold of 0.001 for a No set of 1000 bases would be too small, whereas 10000 bases would be just at the limit for that parameter.

To handle incidental enrichment of biologically not meaningful PFMs in some Yes- and No-set combinations, the program can draw a specified number of samples from a sufficiently large No sequence set and carry out the enrichment analysis for each No sample (option "Analyze multiple No sets"). A summary output is then prepared that shows for each matrix with how many No sets it satisfied given thresholds.

Personal tools
Namespaces

Variants
Actions
BioUML platform
Community
Modelling
Analysis & Workflows
Collaborative research
Development
Virtual biology
Wiki
Toolbox