Site prediction

From BioUML platform
Jump to: navigation, search

Prediction of TF-binding sites of given TF in whole genome or in given chromosome fragment or in ChIP-Seq dataset. Sites are predicted by different Position Weight Matrix (PWM) methods.

Description

The following 6 PWM methods (models) were available:

  1. Given HOCOMOCO site models (Kulakovskiy et al, 2016). These models are available in HOCOMOCO database. They are located at "databases/HOCOMOCO v11/Data/PWM_HUMAN_mono_pval=0.0001”
  2. MATCH models (Kel et al, 2003);
  3. Additive IPS (Individual Probability Score) models, or briefly, IPS models (Volkova et al, 2018);
  4. Multiplicative IPS models. These models can be reduced to equivalent additive IPS models by taking logarithms of matrix elements;
  5. Common additive models;
  6. Common multiplicative models.

For determination of common additive and multiplicative models let’s matrix MAT = (mij), i={A,C,G,T}, denotes the given frequency matrix, j=1,...,l and l denotes the length of sites. For this analysis we used HOCOMOCO frequency matrices available in HOCOMOCO database (Kulakovskiy et al, 2016). These matrices They are located at “"databases/HOCOMOCO v11/Data/PCM_HUMAN_mono/”. To test an arbitrary DNA fragment S=(s1,...,sl), the common additive score x is determined using a standard way:

x = Σj=1,...,l score(j),

where the score(j), j=1,…,l, are determined as follows:

score(j) = {mAj, if sj=A; mCj, if sj=C; mGj, if sj=G; mTj, if sj=T;}, j=1,…,l

The common multiplicative score y is determined by formula:

y = ∏j=1,...,l score(j).

If the the calculated score (x or y) exceeds the pre-specified threshold, then the tested DNA fragment S is declared as the predicted site. It is important to note that common multiplicative model can be converted to equivalent additive model by taking logarithms of matrix elements, i.e.

y = Σj=1,...,l score*(j),

where the values score*(j), j=1,…,l, are determined as follows:

score*(j) = {ln(mAj), if sj=A; ln(mCj), if sj=C; ln(mGj), if sj=G; ln(mTj) if sj=T}.


References

Kulakovskiy,I.V., Vorontsov,I.E., Yevshin,I.S., Soboleva,A.V., Kasianov,A.S., Ashoor,H., Ba-Alawi,W., Bajic,V.B., Medvedeva,Y.A., Kolpakov,F.A. et al. (2016) HOCOMOCO: expansion and enhancement of the collection of transcription factor binding sites models. Nucleic Acids Res., 44, D116–D125.

Kel, A.E., Gobling, E., Reuter, I., Cheremushkin, E., Kel-Margoulis, O.V. and Wingender, E. (2003) MATCHTM: a tool for searching transcription factor binding sites in DNA sequences, Nucleic Acids Res., 31, p.3576-3579.

Volkova OA, Kondrakhin YV, Kashapov TA, Sharipov RN. Comparative analysis of protein-coding and long non-coding transcripts based on RNA sequence features. J Bioinform Comput Biol. 2018 Apr;16(2):1840013. doi: 10.1142/S0219720018400139.


Analysis Parameters:

  • Sequence set type – Select type of sequences;
    • Available sequence types: 1) Whole genome 2) Chromosome fragment 3) ChIP-Seq peaks from given track
  • Sequences collection – Select a source of nucleotide sequences
    • Sequences source – Select database to get sequences from or 'Custom' to specify sequences location manually
    • Sequence collection – Specify path to folder containing sequences if 'Custom' sequences source is selected
  • If Sequence set type = Chromosome fragment
    • Chromosome name – Select chromosome name
    • Start position – Type start position of chromosome fragment
    • Finish position – Type finish position of chromosome fragment
  • If Sequence set type = ChIP-Seq peaks from given track
    • Path to track – Select Path to track with ChIP-Seq dataset; For example, track from GTRD database can be selected, i.e. Path to track = databases/GTRD/Data/peaks/gem/PEAKS033057
  • Site name – type name of predicted sites
  • Prediction models – Define prediction models. User can define several prediction models.
    • modelName – Type model name
    • siteType – Select site type;
      • Available site types: 1) Given site model 2) IPS model 3) Multiplicative IPS model 4) Common additive model 5) Common multiplicative model 6) MATCH model
    • If siteType = Given site model
      • modelPath – Input path to given site prediction model. In particular, user can select given site model from HOCOMOCO database, such as "databases/HOCOMOCO v11/Data/PWM_HUMAN_mono_pval=0.0001/CEBPA_HUMAN.H11MO.0.A"
    • If siteType ≠ Given site model
      • matrixPath - Input path to given frequency matrix . In particular, user can select given frequency matrix from HOCOMOCO database, such as "databases/HOCOMOCO v11/Data/PCM_HUMAN_mono/CEBPA_HUMAN.H11MO.0.A"
      • threshold - Type threshold
  • The output track name – Type the output track name
  • Path to output folder – Path to output fold
Personal tools
Namespaces

Variants
Actions
BioUML platform
Community
Modelling
Analysis & Workflows
Collaborative research
Development
Virtual biology
Wiki
Toolbox