GTRD Workflow

From BioUML platform
Revision as of 17:20, 1 July 2016 by Ivan Yevshin (Talk | contribs)

Jump to: navigation, search

ChIP-seq experiment information were collected in semi-automated way from literature, GEO and ENCODE.

Raw ChIP-seq data in the form of fastq and SRA files were fetched from ENCODE and SRA databases.

Sequenced reads were aligned using Bowtie2 [1] aligner.

ChIP-seq peaks were called using 4 different methods: MACS [2] SISSRS [3] GEM [4] and PICS [5].

Contents

Bowtie2

We use bowtie2 version 2.2.3 for ChIP-seq read alignment to the reference genomes of human (GRCh38) and mouse (GRCm38).

Bowtie2 was run with following parameters:

bowtie2 -x $genome -U $fastq_files -p 8 --mm --seed 0

The resulting alignments were converted to bam files, then sorted and indexed using samtools version 1.0

MACS

MACS version 1.4.2 was used for peak calling with following parameters:

macs14 f BAM -g $species -n $peaks -t $alignment_bam

or if control experiment was available:

macs14 f BAM -g $species -n $peaks -t $alignment_bam -c $control_bam

SISSRS

SISSRS requires alignments in bed format, bam files were converted to bed files using bedtools version 2 by:

bamToBed -i $input_bam > $output_bed

Version 1.4 of SISSRS were used for peaks calling with following parameters:

sissrs.pl -i $alignment_bed -s 3000000000 -o $peaks.sissrs

or if control experiment was available:

sissrs.pl -i $alignment_bed -s 3000000000 -o $peaks.sissrs -b $control_bed

GEM

GEM version 2.5 was used with following parameters:

java -Xmx4G -XX:+UseSerialGC -jar /srv/local-main/tools/gem/gem.jar --d /srv/local-main/tools/gem/Read_Distribution_default.txt
--g /srv/local-main/tools/gem/$species.chrom.sizes --s 2000000000 --f SAM --t 1 --out $peaks --expt $bam

or if control experiment was available:

java -Xmx4G -XX:+UseSerialGC -jar /srv/local-main/tools/gem/gem.jar --d /srv/local-main/tools/gem/Read_Distribution_default.txt
--g /srv/local-main/tools/gem/$species.chrom.sizes --s 2000000000 --f SAM --t 1 --out $peaks --expt $bam --ctrl $control

For the large datasets -Xmx24G parameter was set.

PICS

For peak calling with PICS method we use R version 3.2.0 and PICS version 2.12.0. We use the following custom R script:

Gtrd-workflow-pics-script.png

References

  1. Langmead B, Salzberg S. Fast gapped-read alignment with Bowtie 2. Nature Methods. 2012, 9:357-359.
  2. Zhang et al. Model-based Analysis of ChIP-Seq (MACS). Genome Biol (2008) vol. 9 (9) pp. R137
  3. Leelavati Narlikar, Raja Jothi. ChIP-Seq data analysis: identification of protein-DNA binding sites with SISSRs peak-finder. Methods in Molecular Biology, 802:305-22, 2012.
  4. Yuchun Guo, Shaun Mahony & David K Gifford. High Resolution Genome Wide Binding Event Finding and Motif Discovery Reveals Transcription Factor Spatial Binding Constraints. PLoS Computational Biology 8(8): e1002638. 2012.
  5. Zhang X, Robertson G, Krzywinski M, Ning K, Droit A, Jones S and Gottardo R. “PICS: Probabilistic Inference for ChIP-seq.” Biometrics, 66. 2010.
Personal tools
Namespaces

Variants
Actions
BioUML platform
Community
Modelling
Analysis & Workflows
Collaborative research
Development
Virtual biology
Wiki
Toolbox