Difference between revisions of "Track statistics (analysis)"
From BioUML platform
(Provider information added) |
(Automatic synchronization with BioUML) |
||
(10 intermediate revisions by one user not shown) | |||
Line 1: | Line 1: | ||
;Analysis title | ;Analysis title | ||
− | :Track statistics | + | :[[File:NGS-utils-Track-statistics-icon.png]] Track statistics |
;Provider | ;Provider | ||
:[[Institute of Systems Biology]] | :[[Institute of Systems Biology]] | ||
+ | ;Class | ||
+ | :{{Class|ru.biosoft.bsastats.SequenceStatistics}} | ||
+ | ;Plugin | ||
+ | :[[Ru.biosoft.bsastats (plugin)|ru.biosoft.bsastats (Bio-sequences analyses plugin extension)]] | ||
==== Description ==== | ==== Description ==== | ||
− | Gather various statistics about track or FASTQ file | + | Gather various statistics about track or FASTQ file. |
==== Parameters: ==== | ==== Parameters: ==== | ||
Line 32: | Line 36: | ||
[[Category:Analyses]] | [[Category:Analyses]] | ||
− | [[Category: | + | [[Category:NGS utils (analyses group)]] |
+ | [[Category:ISB analyses]] | ||
[[Category:Autogenerated pages]] | [[Category:Autogenerated pages]] |
Latest revision as of 18:15, 9 December 2020
- Analysis title
- Track statistics
- Provider
- Institute of Systems Biology
- Class
SequenceStatistics
- Plugin
- ru.biosoft.bsastats (Bio-sequences analyses plugin extension)
[edit] Description
Gather various statistics about track or FASTQ file.
[edit] Parameters:
- Source – Whether to get input data from track or from FASTQ
- Input track – Track to process
- FASTQ file – FASTQ file with reads to analyze
- Quality encoding – This specifies how phred quality values are encoded in the FASTQ file. In most of the cases system detects this value automatically. You may change it manually if auto-detection worked incorrectly.
- CSFasta file – File containing reads in color space
- Qual file – File containing corresponding quality values
- Alignment – Whether to align sites on left or right
- Processors – List of methods to gather the statistics
- Basic statistics – Gathers basic statistics like reads count and average read length
- Quality per base – Distribution of phred quality score along the bases
- Quality per sequence – Distribution of phred quality score among the sequences
- Nucleotide content per base – Distribution of individual nucleotides along the bases
- GC content per base – Distribution of GC along the bases
- GC content per sequence – Draws a distribution of GC content among reads
- N content per base – Distribution of 'N' along the bases
- Sequence length distribution – Calculates distribution of read lengths and outputs them as the table and as the chart
- Duplicate sequences – Calculate the rate of sequences duplication: how many sequences occurs 2, 3 and so on times relative to unique sequences. This statistic is based on the first 200000 reads
- Overrepresented sequences – Look for sequences which appear in more than 0.1% cases
- Overrepresented K-mers – Search for K-mers which are represented 3x times per sequence or 5x times per position
- Overrepresented prefixes – Search for read prefixes (starting from the read start) up to 15 bp long which are overrepresented in the set.
- Output path – Path to the output folder (will be created)