Difference between revisions of "GTRD Workflow"
Ivan Yevshin (Talk | contribs) |
Ivan Yevshin (Talk | contribs) |
||
Line 11: | Line 11: | ||
[https://groups.csail.mit.edu/cgs/gem/ GEM] <ref> Yuchun Guo, Shaun Mahony & David K Gifford. High Resolution Genome Wide Binding Event Finding and Motif Discovery Reveals Transcription Factor Spatial Binding Constraints. PLoS Computational Biology 8(8): e1002638. 2012.</ref> | [https://groups.csail.mit.edu/cgs/gem/ GEM] <ref> Yuchun Guo, Shaun Mahony & David K Gifford. High Resolution Genome Wide Binding Event Finding and Motif Discovery Reveals Transcription Factor Spatial Binding Constraints. PLoS Computational Biology 8(8): e1002638. 2012.</ref> | ||
and [http://www.bioconductor.org/packages/release/bioc/html/PICS.html PICS] <ref>Zhang X, Robertson G, Krzywinski M, Ning K, Droit A, Jones S and Gottardo R. “PICS: Probabilistic Inference for ChIP-seq.” Biometrics, 66. 2010.</ref>. | and [http://www.bioconductor.org/packages/release/bioc/html/PICS.html PICS] <ref>Zhang X, Robertson G, Krzywinski M, Ning K, Droit A, Jones S and Gottardo R. “PICS: Probabilistic Inference for ChIP-seq.” Biometrics, 66. 2010.</ref>. | ||
+ | |||
+ | Peaks computed for the same transcription factor and peak calling method, but different experiment conditions (e.g., cell line, treatment, etc.) were joined into clusters. | ||
+ | |||
+ | Clusters for the same TF revealed by different peak calling methods were joined into metaclusters. Metaclusters represent non-redundant set of transcription factor binding sites. | ||
==Bowtie2== | ==Bowtie2== |
Revision as of 13:21, 10 October 2016
ChIP-seq experiment information were collected in semi-automated way from literature, GEO and ENCODE.
Raw ChIP-seq data in the form of fastq and SRA files were fetched from ENCODE and SRA databases.
Sequenced reads were aligned using Bowtie2 [1] aligner.
ChIP-seq peaks were called using 4 different methods: MACS [2] SISSRS [3] GEM [4] and PICS [5].
Peaks computed for the same transcription factor and peak calling method, but different experiment conditions (e.g., cell line, treatment, etc.) were joined into clusters.
Clusters for the same TF revealed by different peak calling methods were joined into metaclusters. Metaclusters represent non-redundant set of transcription factor binding sites.
Contents |
Bowtie2
We use bowtie2 version 2.2.3 for ChIP-seq read alignment to the reference genomes of human (GRCh38) and mouse (GRCm38).
Bowtie2 was run with following parameters:
bowtie2 -x $genome -U $fastq_files -p 8 --mm --seed 0
The resulting alignments were converted to bam files, then sorted and indexed using samtools version 1.0
MACS
MACS version 1.4.2 was used for peak calling with following parameters:
macs14 f BAM -g $species -n $peaks -t $alignment_bam
or if control experiment was available:
macs14 f BAM -g $species -n $peaks -t $alignment_bam -c $control_bam
SISSRS
SISSRS requires alignments in bed format, bam files were converted to bed files using bedtools version 2 by:
bamToBed -i $input_bam > $output_bed
Version 1.4 of SISSRS were used for peaks calling with following parameters:
sissrs.pl -i $alignment_bed -s 3000000000 -o $peaks.sissrs
or if control experiment was available:
sissrs.pl -i $alignment_bed -s 3000000000 -o $peaks.sissrs -b $control_bed
GEM
GEM version 2.5 was used with following parameters:
java -Xmx4G -XX:+UseSerialGC -jar /srv/local-main/tools/gem/gem.jar --d /srv/local-main/tools/gem/Read_Distribution_default.txt
--g /srv/local-main/tools/gem/$species.chrom.sizes --s 2000000000 --f SAM --t 1 --out $peaks --expt $bam
or if control experiment was available:
java -Xmx4G -XX:+UseSerialGC -jar /srv/local-main/tools/gem/gem.jar --d /srv/local-main/tools/gem/Read_Distribution_default.txt
--g /srv/local-main/tools/gem/$species.chrom.sizes --s 2000000000 --f SAM --t 1 --out $peaks --expt $bam --ctrl $control
For the large datasets -Xmx24G
parameter was set.
PICS
For peak calling with PICS method we use R version 3.2.0 and PICS version 2.12.0. We use the following custom R script:
References
- ↑ Langmead B, Salzberg S. Fast gapped-read alignment with Bowtie 2. Nature Methods. 2012, 9:357-359.
- ↑ Zhang et al. Model-based Analysis of ChIP-Seq (MACS). Genome Biol (2008) vol. 9 (9) pp. R137
- ↑ Leelavati Narlikar, Raja Jothi. ChIP-Seq data analysis: identification of protein-DNA binding sites with SISSRs peak-finder. Methods in Molecular Biology, 802:305-22, 2012.
- ↑ Yuchun Guo, Shaun Mahony & David K Gifford. High Resolution Genome Wide Binding Event Finding and Motif Discovery Reveals Transcription Factor Spatial Binding Constraints. PLoS Computational Biology 8(8): e1002638. 2012.
- ↑ Zhang X, Robertson G, Krzywinski M, Ning K, Droit A, Jones S and Gottardo R. “PICS: Probabilistic Inference for ChIP-seq.” Biometrics, 66. 2010.