Quantification of RNA-seq with Cufflinks (with de-novo assembly) for FASTQ files (workflow)

From BioUML platform
Jump to: navigation, search
Workflow title
Quantification of RNA-seq with Cufflinks (with de-novo assembly) for FASTQ files
Provider
geneXplain GmbH

Workflow overview

Quantification-of-RNA-seq-with-Cufflinks-with-de-novo-assembly-for-FASTQ-files-workflow-overview.png

Description

This workflow offers the ability to discover new genes and transcripts (splice variants) and measure transcript expression in a single assay from RNA-seq data.

This workflow is described in “Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks”, Nat. Protoc. 7:562-578, 2012.

The first step of the workflow is a read alignment with TopHat (http://tophat.cbcb.umd.edu/). TopHat aligns reads to the genome and discovers transcript splice sites. Output files from TopHat are tables and tracks with insertions, deletions, splice junctions and the alignments.

These output files are subjected to Cufflinks. Cufflinks uses the alignments to map reads against the genome and to assemble the reads into transcripts. Output tracks of Cufflinks is the Assembled transcripts track, output tables of Cufflinks are Gene expression and Transcript expression tables.

In the current workflow the transcripts are assembled “de-novo”, since here it is a “de-novo” reconstruction of exon-intron structure, no known gene or transcript names are given. All transcripts are defined by the tracking_id, like Cuff.1.1 and so on. This allows us to find new transcripts that were not yet discovered and annotated in the reference genome.

At the next step the output of Cufflinks is subjected to Cuffmerge, is essentially a ‘meta-assembler’ — it treats the assembled transfrags the way Cufflinks treats reads, merging them together parsimoniously. Output is a Merged assembly track.

Differential expression is performed by Cuffdiff, part of the Cufflinks package (http://cufflinks.cbcb.umd.edu/), which calculates expression in two or more samples and tests the statistical significance of each observed change in expression between them.

Output is a folder with many differentially expressed genes.

 

Parameters

Experiment fastq files
Control fastq files
Output folder
Personal tools
Namespaces

Variants
Actions
BioUML platform
Community
Modelling
Analysis & Workflows
Collaborative research
Development
Virtual biology
Wiki
Toolbox