Track statistics (analysis)

From BioUML platform
Revision as of 18:15, 9 December 2020 by BioUML wiki Bot (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
Analysis title
NGS-utils-Track-statistics-icon.png Track statistics
Provider
Institute of Systems Biology
Class
SequenceStatistics
Plugin
ru.biosoft.bsastats (Bio-sequences analyses plugin extension)

Description

Gather various statistics about track or FASTQ file.

Parameters:

  • Source – Whether to get input data from track or from FASTQ
  • Input track – Track to process
  • FASTQ file – FASTQ file with reads to analyze
  • Quality encoding – This specifies how phred quality values are encoded in the FASTQ file. In most of the cases system detects this value automatically. You may change it manually if auto-detection worked incorrectly.
  • CSFasta file – File containing reads in color space
  • Qual file – File containing corresponding quality values
  • Alignment – Whether to align sites on left or right
  • Processors – List of methods to gather the statistics
    • Basic statistics – Gathers basic statistics like reads count and average read length
    • Quality per base – Distribution of phred quality score along the bases
    • Quality per sequence – Distribution of phred quality score among the sequences
    • Nucleotide content per base – Distribution of individual nucleotides along the bases
    • GC content per base – Distribution of GC along the bases
    • GC content per sequence – Draws a distribution of GC content among reads
    • N content per base – Distribution of 'N' along the bases
    • Sequence length distribution – Calculates distribution of read lengths and outputs them as the table and as the chart
    • Duplicate sequences – Calculate the rate of sequences duplication: how many sequences occurs 2, 3 and so on times relative to unique sequences. This statistic is based on the first 200000 reads
    • Overrepresented sequences – Look for sequences which appear in more than 0.1% cases
    • Overrepresented K-mers – Search for K-mers which are represented 3x times per sequence or 5x times per position
    • Overrepresented prefixes – Search for read prefixes (starting from the read start) up to 15 bp long which are overrepresented in the set.
  • Output path – Path to the output folder (will be created)
Personal tools
Namespaces

Variants
Actions
BioUML platform
Community
Modelling
Analysis & Workflows
Collaborative research
Development
Virtual biology
Wiki
Toolbox