Search for enriched TFBSs (tracks) (analysis)
- Analysis title
- Search for enriched TFBSs (tracks)
- Provider
- geneXplain GmbH
- Class
EnrichedTFBSFinderTx
- Plugin
- com.genexplain.analyses (geneXplain analyses)
Contents |
Search for enriched TFBSs in a track
Parameters
- Yes set - Study track / Track with intervals of interest
- No set - Background track / Track of non-bound intervals
- Sequence source - Choose a deployed sequence source from the pull-down list. Selecting Custom enables setting of a custom sequence collection
- Sequence collection - Resource / Folder with sequences containing Yes and No intervals
- Input motif profile - Profile of weight matrices
- Output path - Path in workspace to store output table
- Initial cut-off - Score cut-off to initiate search for optimal threshold given in [frequency of predicted sites per base]
- Analyze multiple No sets - Analyze the specified number and size of No sets to sampled
- Number of samples - Number of No set samples
- Sample size - Number of No set sequences to sample
- Site enrichment cutoff - Threshold for enrichment of sites in Yes set
- Site FDR cutoff - Threshold for FDR of site enrichment in Yes set
- Sequence enrichment cutoff - Threshold for enrichment of Yes sequences with sites in Yes set
- Sequence FDR cutoff - Threshold for FDR of Yes sequences with sites in Yes set
Output
The output contains the columns described below. Columns highlighted in bold are shown in the default view. The other columns can be included on demand via the Columns tab of the lower right panel (available with opened output table).
- Adj. site FE
- Adjusted fold enrichment of sites in Yes set
- Site FDR
- FDR of site enrichment (Benjamini-Hochberg method)
- Adj. seq FE
- Adjusted fold enrichment of site containing Yes sequences
- Seq FDR
- FDR of sequence enrichment (Benjamini-Hochberg method)
- #Yes sites per 1K
- Number of sites per 1000 scanned windows in Yes set
- #No sites per 1K
- Number of sites per 1000 scanned windows in No set
- Site P-value
- P-value of site enrichment (binomial test)
- Site cutoff
- Score cut-off with best site enrichment
- %Yes seq
- Percent Yes sequences with at least one site
- %No seq
- Percent No sequences with at least one site
- Seq P-value
- P-value of sequence enrichment (Fisher test)
- Seq cutoff
- Score cut-off with best sequence enrichment
Description
This method searches for enriched transcription factor binding sites given a set of described Position-specific Frequency Matrices (PFMs), e.g. as collected in Transfac(R).
Fold enrichment of sites (Site FE) as well as of sequences with at least one site (Seq FE) are optimized and reported as statistically corrected odds ratios (99% confidence interval). The reported values correct for small site or sequence numbers, taking into account possible variability, and are therefore more suitable for ranking PFMs by their fold enrichment in Yes promoters.
Furthermore, the algorithm seeks optimal score thresholds for each type of enrichment separately and reports False Discovery Rates (FDRs) in addition to uncorrected P-values.
An initial (low, permissive) score threshold for optimization is estimated using sequences in the No set. The threshold is specified as a single parameter, the frequency of sites per basepair, (see Expert options), thereby omitting the necessity to compile a PFM profile.
To ensure smooth performance, the routine imposes some limits on the input. Yes and No sequence sets must comprise at most 10 million bases and PFMs are expected to comprise at least 4 positions. Finally, the initial frequency cut-off should have 10-fold support by the No set, e.g. setting a threshold of 0.001 for a No set of 1000 bases would be too small, whereas 10000 bases would be just at the limit for that parameter.
To handle incidental enrichment of biologically not meaningful PFMs in some Yes- and No-set combinations, the program can draw a specified number of samples from a sufficiently large No sequence set and carry out the enrichment analysis for each No sample (option "Analyze multiple No sets"). A summary output is then prepared that shows for each matrix with how many No sets it satisfied given thresholds.