Difference between revisions of "Fasta format"
From BioUML platform
m (Protected "Fasta format": Autogenerated page ([edit=sysop] (indefinite))) |
(Plugin link added) |
||
Line 1: | Line 1: | ||
+ | ;File format title | ||
+ | :Fasta format (*.fasta) | ||
+ | ;Plugin | ||
+ | :[[Ru.biosoft.bsa (plugin)|ru.biosoft.bsa (Bio-sequences analyses plugin)]] | ||
+ | |||
=== FASTA format === | === FASTA format === | ||
Revision as of 14:12, 7 May 2013
- File format title
- Fasta format (*.fasta)
- Plugin
- ru.biosoft.bsa (Bio-sequences analyses plugin)
Contents |
FASTA format
A sequence in FASTA format begins with a single-line description, followed by lines of sequence data, which usually do not exceed 80 characters. The header line begins with ">" (greater-than) symbol and gives a name and comments for the sequence. Sequences are expected to be represented in the standard IUB/IUPAC amino acid and nucleic acid codes, with these exceptions: lower-case letters are accepted and are mapped into upper-case; a single hyphen or dash can be used to represent a gap of indeterminate length; and in amino acid sequences, U and * are acceptable letters. Numerical digits are not allowed.
The supported nucleic acid codes
A → adenosine M → A C (amino) C → cytidine S → G C (strong) G → guanine W → A T (weak) T → thymidine B → G T C U → uridine D → G A T R → G A (purine) H → A C T Y → T C (pyrimidine) V → G C A K → G T (keto) N → A G C T (any) - → gap of indeterminate length
The supported codes (24 amino acids and 3 special codes)
A alanine P proline B aspartate or asparagine Q glutamine C cystine R arginine D aspartate S serine E glutamate T threonine F phenylalanine U selenocysteine G glycine V valine H histidine W tryptophan I isoleucine Y tyrosine K lysine Z glutamate or glutamine L leucine X any M methionine * translation stop N asparagine - gap of indeterminate length
Example
>gi|5524211|gb|AAD44166.1| cytochrome b LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX IENY