Difference between revisions of "Track statistics (analysis)"

From BioUML platform
Jump to: navigation, search
(Automatic synchronization with BioUML)
 
(Automatic synchronization with BioUML)
 
(13 intermediate revisions by one user not shown)
Line 1: Line 1:
Gather various statistics about track or FASTQ file
+
;Analysis title
 +
:[[File:NGS-utils-Track-statistics-icon.png]] Track statistics
 +
;Provider
 +
:[[Institute of Systems Biology]]
 +
;Class
 +
:{{Class|ru.biosoft.bsastats.SequenceStatistics}}
 +
;Plugin
 +
:[[Ru.biosoft.bsastats (plugin)|ru.biosoft.bsastats (Bio-sequences analyses plugin extension)]]
 +
 
 +
==== Description ====
 +
Gather various statistics about track or FASTQ file.
  
 
==== Parameters: ====
 
==== Parameters: ====
Line 26: Line 36:
  
 
[[Category:Analyses]]
 
[[Category:Analyses]]
[[Category:BSA (analyses group)]]
+
[[Category:NGS utils (analyses group)]]
 +
[[Category:ISB analyses]]
 +
[[Category:Autogenerated pages]]

Latest revision as of 18:15, 9 December 2020

Analysis title
NGS-utils-Track-statistics-icon.png Track statistics
Provider
Institute of Systems Biology
Class
SequenceStatistics
Plugin
ru.biosoft.bsastats (Bio-sequences analyses plugin extension)

[edit] Description

Gather various statistics about track or FASTQ file.

[edit] Parameters:

  • Source – Whether to get input data from track or from FASTQ
  • Input track – Track to process
  • FASTQ file – FASTQ file with reads to analyze
  • Quality encoding – This specifies how phred quality values are encoded in the FASTQ file. In most of the cases system detects this value automatically. You may change it manually if auto-detection worked incorrectly.
  • CSFasta file – File containing reads in color space
  • Qual file – File containing corresponding quality values
  • Alignment – Whether to align sites on left or right
  • Processors – List of methods to gather the statistics
    • Basic statistics – Gathers basic statistics like reads count and average read length
    • Quality per base – Distribution of phred quality score along the bases
    • Quality per sequence – Distribution of phred quality score among the sequences
    • Nucleotide content per base – Distribution of individual nucleotides along the bases
    • GC content per base – Distribution of GC along the bases
    • GC content per sequence – Draws a distribution of GC content among reads
    • N content per base – Distribution of 'N' along the bases
    • Sequence length distribution – Calculates distribution of read lengths and outputs them as the table and as the chart
    • Duplicate sequences – Calculate the rate of sequences duplication: how many sequences occurs 2, 3 and so on times relative to unique sequences. This statistic is based on the first 200000 reads
    • Overrepresented sequences – Look for sequences which appear in more than 0.1% cases
    • Overrepresented K-mers – Search for K-mers which are represented 3x times per sequence or 5x times per position
    • Overrepresented prefixes – Search for read prefixes (starting from the read start) up to 15 bp long which are overrepresented in the set.
  • Output path – Path to the output folder (will be created)
Personal tools
Namespaces

Variants
Actions
BioUML platform
Community
Modelling
Analysis & Workflows
Collaborative research
Development
Virtual biology
Wiki
Toolbox