Method, code, references |
Input data |
Algorithm |
Accuracy |
Comment
|
INVOKE (R script)[1]
https://github.com/SchulzLab/TEPIC/tree/master/MachineLearningPipelines/INVOKE
|
Input:
- TF-genes scores (calculated by TEPIC)
- open chromatin data (DNaseI-seq, NOMe-seq)
- PWM (Jaspar, HOCOMOCO, Uniprobe)
- expression data (RNA-seq)
Output:
- regression coefficients for TF
- model performance: Pearson correlation, Spearman correlation, and MSE
- boxplot showing model performance
- heatmap (top 10 positive and negative coefficients)
- scatter plots for predicted versus the measured gene expression data
|
INVOKE offers linear regression with various regularisation techniques (Lasso, Ridge, Elastic net) to infer potentially important transcriptional regulators by predicting gene expression from TEPIC TF-gene scores.
|
|
|
[2]
|
Input:
- ChIP-seq data
- expression data (RNA-seq)
Output:
- log-linear regression model
- principal components with weights of corresponding TFs
|
- for each TF, each gene - compute a TF association strength (TFAS) - the weighted sum of the corresponding ChIP-Seq signal strength, where the weights reflect the proximity of the signal to the gene.
- principal component analysis (PCA) to extract uncorrelated characteristic patterns in the TFAS vectors.
- centered and standardized the TFAS matrix A is decomposed by the singular value decomposition (SVD)
- regression-based component selection
- gene expression is expressed by the log-linear regression model
|
mouse ESCs, r=0.806, R2=0.65, CV-R2=0.64
|
|
|
|
|
|
|
|
|
|
|
|
References
Error fetching PMID 27899623:
Error fetching PMID 19995984:
- Error fetching PMID 27899623: [Schmidt217]
- Error fetching PMID 19995984: [Ouyang2007]
All Medline abstracts: PubMed | HubMed