GTRD Workflow
ChIP-seq experiment information were collected in semi-automated way from literature, GEO and ENCODE.
Raw ChIP-seq data in the form of fastq and SRA files were fetched from ENCODE and SRA databases.
Sequenced reads were aligned using Bowtie2 aligner.
ChIP-seq peaks were called using 4 different methods: MACS [1] SISSRS [2] GEM [3] and PICS [4].
Contents |
Bowtie2
We use bowtie2 version 2.2.3 for ChIP-seq read alignment to the reference genomes of human (GRCh38) and mouse (GRCm38).
Bowtie2 was run with following parameters:
bowtie2 -x $genome -U $fastq_files -p 8 --mm --seed 0
The resulting alignments were converted to bam files, then sorted and indexed using samtools version 1.0
MACS
MACS version 1.4.2 was used for peak calling with following parameters:
macs14 f BAM -g $species -n $peaks -t $alignment_bam
or if control experiment was available:
macs14 f BAM -g $species -n $peaks -t $alignment_bam -c $control_bam
SISSRS
SISSRS requires alignments in bed format, bam files were converted to bed files using bedtools version 2 by:
bamToBed -i $input_bam > $output_bed
Version 1.4 of SISSRS were used for peaks calling with following parameters:
sissrs.pl -i $alignment_bed -s 3000000000 -o $peaks.sissrs
or if control experiment was available:
sissrs.pl -i $alignment_bed -s 3000000000 -o $peaks.sissrs -b $control_bed
GEM
GEM version 2.5 was used with following parameters:
java -Xmx4G -XX:+UseSerialGC -jar /srv/local-main/tools/gem/gem.jar --d /srv/local-main/tools/gem/Read_Distribution_default.txt
--g /srv/local-main/tools/gem/$species.chrom.sizes --s 2000000000 --f SAM --t 1 --out $peaks --expt $bam
or if control experiment was available:
java -Xmx4G -XX:+UseSerialGC -jar /srv/local-main/tools/gem/gem.jar --d /srv/local-main/tools/gem/Read_Distribution_default.txt
--g /srv/local-main/tools/gem/$species.chrom.sizes --s 2000000000 --f SAM --t 1 --out $peaks --expt $bam --ctrl $control
For the large datasets -Xmx24G
parameter was set.
PICS
For peak calling with PICS method we use R version 3.2.0 and PICS version 2.12.0. We use the following custom R script:
References
- ↑ Zhang et al. Model-based Analysis of ChIP-Seq (MACS). Genome Biol (2008) vol. 9 (9) pp. R137
- ↑ Leelavati Narlikar, Raja Jothi. ChIP-Seq data analysis: identification of protein-DNA binding sites with SISSRs peak-finder. Methods in Molecular Biology, 802:305-22, 2012.
- ↑ Yuchun Guo, Shaun Mahony & David K Gifford. High Resolution Genome Wide Binding Event Finding and Motif Discovery Reveals Transcription Factor Spatial Binding Constraints. PLoS Computational Biology 8(8): e1002638. 2012.
- ↑ Zhang X, Robertson G, Krzywinski M, Ning K, Droit A, Jones S and Gottardo R. “PICS: Probabilistic Inference for ChIP-seq.” Biometrics, 66. 2010.