Difference between revisions of "Fasta format"
From BioUML platform
m ({{Type link}} used) |
(Automatic synchronization with BioUML) |
||
Line 2: | Line 2: | ||
:Fasta format (*.fasta) | :Fasta format (*.fasta) | ||
;Element type | ;Element type | ||
− | :{{Type link| | + | :{{Type link|collection of sequences}} |
;Plugin | ;Plugin | ||
:[[Ru.biosoft.bsa (plugin)|ru.biosoft.bsa (Bio-sequences analyses plugin)]] | :[[Ru.biosoft.bsa (plugin)|ru.biosoft.bsa (Bio-sequences analyses plugin)]] |
Latest revision as of 11:20, 13 January 2014
- File format title
- Fasta format (*.fasta)
- Element type
- collection of sequences
- Plugin
- ru.biosoft.bsa (Bio-sequences analyses plugin)
Contents |
[edit] FASTA format
A sequence in FASTA format begins with a single-line description, followed by lines of sequence data, which usually do not exceed 80 characters. The header line begins with ">" (greater-than) symbol and gives a name and comments for the sequence. Sequences are expected to be represented in the standard IUB/IUPAC amino acid and nucleic acid codes, with these exceptions: lower-case letters are accepted and are mapped into upper-case; a single hyphen or dash can be used to represent a gap of indeterminate length; and in amino acid sequences, U and * are acceptable letters. Numerical digits are not allowed.
[edit] The supported nucleic acid codes
A → adenosine M → A C (amino) C → cytidine S → G C (strong) G → guanine W → A T (weak) T → thymidine B → G T C U → uridine D → G A T R → G A (purine) H → A C T Y → T C (pyrimidine) V → G C A K → G T (keto) N → A G C T (any) - → gap of indeterminate length
[edit] The supported codes (24 amino acids and 3 special codes)
A alanine P proline B aspartate or asparagine Q glutamine C cystine R arginine D aspartate S serine E glutamate T threonine F phenylalanine U selenocysteine G glycine V valine H histidine W tryptophan I isoleucine Y tyrosine K lysine Z glutamate or glutamine L leucine X any M methionine * translation stop N asparagine - gap of indeterminate length
[edit] Example
>gi|5524211|gb|AAD44166.1| cytochrome b LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX IENY