Difference between revisions of "Hadoop"

From BioUML platform
Jump to: navigation, search
(Created page with "{{Stub}} ==List of Hadoop applications for NGS== Hadoop MapReduce-based approaches have become increasingly popular due to their scalability in processing large sequencing d...")
 
Line 1: Line 1:
 
{{Stub}}
 
{{Stub}}
 +
 +
The open source Apache Hadoop project, which adopts the MapReduce framework and a distributed file system, has recently given bioinformatics researchers an opportunity to achieve scalable, efficient and reliable computing performance on Linux clusters and on cloud computing services.
 +
 +
Survey of MapReduce frame operation in bioinformatics <cite>Zou2013</cite>.
  
  
Line 13: Line 17:
 
|http://sourceforge.net/projects/seqpig/
 
|http://sourceforge.net/projects/seqpig/
 
http://seqpig.sourceforge.net/ (manual)
 
http://seqpig.sourceforge.net/ (manual)
 +
|-
 +
|BioPig <cite>#Nordberg2013</cite>
 +
|BioPig is based on the Apache's Hadoop MapReduce system and the Pig data flow language.
 +
|https://sites.google.com/a/lbl.gov/biopig/
 +
|-
 +
|DistMap <cite>Pandey2013</cite>
 +
|A modular, scalable and integrated workflow to map reads in the Hadoop distributed computing framework.
 +
It accepts reads in FASTQ format as input and provides mapped reads in a SAM/BAM format.
 +
DistMap supports both paired-end and single-end reads thereby allowing the mapping of read data produced by different sequencing platforms.
 +
 +
Currently, DistMap supports 9 mappers:
 +
*BWA (http://soap.genomics.org.cn/soapaligner.html)
 +
*GSNAP (http://soap.genomics.org.cn/soapaligner.html)
 +
*TopHat (http://tophat.cbcb.umd.edu/)
 +
*Bowtie (http://bowtie-bio.sourceforge.net/index.shtml)
 +
*Bowtie2 (http://sourceforge.net/projects/bowtie-bio/files/bowtie2/2.0.6/)
 +
*SOAP (http://soap.genomics.org.cn/soapaligner.html)
 +
*STAR (http://gingeraslab.cshl.edu/STAR/)
 +
*Bismask (http://www.bioinformatics.babraham.ac.uk/projects/bismark/)
 +
*BSMAP (http://code.google.com/p/bsmap/)
 +
|http://code.google.com/p/distmap/
 +
http://code.google.com/p/distmap/wiki/Manual
 +
|-
 +
|
 +
|
 +
|
 
|}
 
|}
 +
  
  
Line 19: Line 50:
 
==References==
 
==References==
 
<biblio>
 
<biblio>
 +
#Zou2013 pmid=23396756
 
#Schumacher2013 pmid=24149054
 
#Schumacher2013 pmid=24149054
 +
#Nordberg2013 pmid=24021384
 +
#Pandey2013 pmid=24009693
 
</biblio>
 
</biblio>

Revision as of 00:44, 17 November 2013

This page or section is a stub. Please add more information here!

The open source Apache Hadoop project, which adopts the MapReduce framework and a distributed file system, has recently given bioinformatics researchers an opportunity to achieve scalable, efficient and reliable computing performance on Linux clusters and on cloud computing services.

Survey of MapReduce frame operation in bioinformatics [1].


List of Hadoop applications for NGS

Hadoop MapReduce-based approaches have become increasingly popular due to their scalability in processing large sequencing data sets[2].

Tool, Ref Description URL
SeqPig [2] A library and a collection of tools to manipulate, analyze and query sequencing data sets in a scalable and simple manner.

SeqPig scripts use the Hadoop-based distributed scripting engine Apache Pig, which automatically parallelizes and distributes data processing tasks.

http://sourceforge.net/projects/seqpig/

http://seqpig.sourceforge.net/ (manual)

BioPig [3] BioPig is based on the Apache's Hadoop MapReduce system and the Pig data flow language. https://sites.google.com/a/lbl.gov/biopig/
DistMap [4] A modular, scalable and integrated workflow to map reads in the Hadoop distributed computing framework.

It accepts reads in FASTQ format as input and provides mapped reads in a SAM/BAM format. DistMap supports both paired-end and single-end reads thereby allowing the mapping of read data produced by different sequencing platforms.

Currently, DistMap supports 9 mappers:

http://code.google.com/p/distmap/

http://code.google.com/p/distmap/wiki/Manual



References

Error fetching PMID 23396756:
Error fetching PMID 24149054:
Error fetching PMID 24021384:
Error fetching PMID 24009693:
  1. Error fetching PMID 23396756: [Zou2013]
  2. Error fetching PMID 24149054: [Schumacher2013]
  3. Error fetching PMID 24021384: [Nordberg2013]
  4. Error fetching PMID 24009693: [Pandey2013]
All Medline abstracts: PubMed | HubMed
Personal tools
Namespaces

Variants
Actions
BioUML platform
Community
Modelling
Analysis & Workflows
Collaborative research
Development
Virtual biology
Wiki
Toolbox