Difference between revisions of "Track statistics (analysis)"

From BioUML platform
Jump to: navigation, search
(Default analysis icon as separate file)
(Automatic synchronization with BioUML)
Line 1: Line 1:
 
;Analysis title
 
;Analysis title
:[[File:Default-analysis-icon.png]] Track statistics
+
:[[File:BSA-Track-statistics-icon.png]] Track statistics
 
;Provider
 
;Provider
 
:[[Institute of Systems Biology]]
 
:[[Institute of Systems Biology]]

Revision as of 19:00, 13 February 2017

Analysis title
BSA-Track-statistics-icon.png Track statistics
Provider
Institute of Systems Biology
Class
SequenceStatistics
Plugin
ru.biosoft.bsa (Bio-sequences analyses plugin)

Description

Gather various statistics about track or FASTQ file

Parameters:

  • Source – Whether to get input data from track or from FASTQ
  • Input track – Track to process
  • FASTQ file – FASTQ file with reads to analyze
  • Quality encoding – This specifies how phred quality values are encoded in the FASTQ file. In most of the cases system detects this value automatically. You may change it manually if auto-detection worked incorrectly.
  • CSFasta file – File containing reads in color space
  • Qual file – File containing corresponding quality values
  • Alignment – Whether to align sites on left or right
  • Processors – List of methods to gather the statistics
    • Basic statistics – Gathers basic statistics like reads count and average read length
    • Quality per base – Distribution of phred quality score along the bases
    • Quality per sequence – Distribution of phred quality score among the sequences
    • Nucleotide content per base – Distribution of individual nucleotides along the bases
    • GC content per base – Distribution of GC along the bases
    • GC content per sequence – Draws a distribution of GC content among reads
    • N content per base – Distribution of 'N' along the bases
    • Sequence length distribution – Calculates distribution of read lengths and outputs them as the table and as the chart
    • Duplicate sequences – Calculate the rate of sequences duplication: how many sequences occurs 2, 3 and so on times relative to unique sequences. This statistic is based on the first 200000 reads
    • Overrepresented sequences – Look for sequences which appear in more than 0.1% cases
    • Overrepresented K-mers – Search for K-mers which are represented 3x times per sequence or 5x times per position
    • Overrepresented prefixes – Search for read prefixes (starting from the read start) up to 15 bp long which are overrepresented in the set.
  • Output path – Path to the output folder (will be created)
Personal tools
Namespaces

Variants
Actions
BioUML platform
Community
Modelling
Analysis & Workflows
Collaborative research
Development
Virtual biology
Wiki
Toolbox