GTRD Workflow

From BioUML platform
Revision as of 13:21, 10 October 2016 by Ivan Yevshin (Talk | contribs)

Jump to: navigation, search
Gtrd-workflow.png

ChIP-seq experiment information were collected in semi-automated way from literature, GEO and ENCODE.

Raw ChIP-seq data in the form of fastq and SRA files were fetched from ENCODE and SRA databases.

Sequenced reads were aligned using Bowtie2 [1] aligner.

ChIP-seq peaks were called using 4 different methods: MACS [2] SISSRS [3] GEM [4] and PICS [5].

Peaks computed for the same transcription factor and peak calling method, but different experiment conditions (e.g., cell line, treatment, etc.) were joined into clusters.

Clusters for the same TF revealed by different peak calling methods were joined into metaclusters. Metaclusters represent non-redundant set of transcription factor binding sites.

Contents

Bowtie2

We use bowtie2 version 2.2.3 for ChIP-seq read alignment to the reference genomes of human (GRCh38) and mouse (GRCm38).

Bowtie2 was run with following parameters:

bowtie2 -x $genome -U $fastq_files -p 8 --mm --seed 0

The resulting alignments were converted to bam files, then sorted and indexed using samtools version 1.0

MACS

MACS version 1.4.2 was used for peak calling with following parameters:

macs14 f BAM -g $species -n $peaks -t $alignment_bam

or if control experiment was available:

macs14 f BAM -g $species -n $peaks -t $alignment_bam -c $control_bam

SISSRS

SISSRS requires alignments in bed format, bam files were converted to bed files using bedtools version 2 by:

bamToBed -i $input_bam > $output_bed

Version 1.4 of SISSRS were used for peaks calling with following parameters:

sissrs.pl -i $alignment_bed -s 3000000000 -o $peaks.sissrs

or if control experiment was available:

sissrs.pl -i $alignment_bed -s 3000000000 -o $peaks.sissrs -b $control_bed

GEM

GEM version 2.5 was used with following parameters:

java -Xmx4G -XX:+UseSerialGC -jar /srv/local-main/tools/gem/gem.jar --d /srv/local-main/tools/gem/Read_Distribution_default.txt
--g /srv/local-main/tools/gem/$species.chrom.sizes --s 2000000000 --f SAM --t 1 --out $peaks --expt $bam

or if control experiment was available:

java -Xmx4G -XX:+UseSerialGC -jar /srv/local-main/tools/gem/gem.jar --d /srv/local-main/tools/gem/Read_Distribution_default.txt
--g /srv/local-main/tools/gem/$species.chrom.sizes --s 2000000000 --f SAM --t 1 --out $peaks --expt $bam --ctrl $control

For the large datasets -Xmx24G parameter was set.

PICS

For peak calling with PICS method we use R version 3.2.0 and PICS version 2.12.0. We use the following custom R script:

Gtrd-workflow-pics-script.png

References

  1. Langmead B, Salzberg S. Fast gapped-read alignment with Bowtie 2. Nature Methods. 2012, 9:357-359.
  2. Zhang et al. Model-based Analysis of ChIP-Seq (MACS). Genome Biol (2008) vol. 9 (9) pp. R137
  3. Leelavati Narlikar, Raja Jothi. ChIP-Seq data analysis: identification of protein-DNA binding sites with SISSRs peak-finder. Methods in Molecular Biology, 802:305-22, 2012.
  4. Yuchun Guo, Shaun Mahony & David K Gifford. High Resolution Genome Wide Binding Event Finding and Motif Discovery Reveals Transcription Factor Spatial Binding Constraints. PLoS Computational Biology 8(8): e1002638. 2012.
  5. Zhang X, Robertson G, Krzywinski M, Ning K, Droit A, Jones S and Gottardo R. “PICS: Probabilistic Inference for ChIP-seq.” Biometrics, 66. 2010.
Personal tools
Namespaces

Variants
Actions
BioUML platform
Community
Modelling
Analysis & Workflows
Collaborative research
Development
Virtual biology
Wiki
Toolbox