Difference between revisions of "Fasta format"
From BioUML platform
(Automatic synchronization with BioUML) |
(Added 'Autogenerated pages' category) |
||
Line 42: | Line 42: | ||
[[Category:File formats]] | [[Category:File formats]] | ||
+ | [[Category:Autogenerated pages]] |
Revision as of 16:28, 4 April 2013
Contents |
FASTA format
A sequence in FASTA format begins with a single-line description, followed by lines of sequence data, which usually do not exceed 80 characters. The header line begins with ">" (greater-than) symbol and gives a name and comments for the sequence. Sequences are expected to be represented in the standard IUB/IUPAC amino acid and nucleic acid codes, with these exceptions: lower-case letters are accepted and are mapped into upper-case; a single hyphen or dash can be used to represent a gap of indeterminate length; and in amino acid sequences, U and * are acceptable letters. Numerical digits are not allowed.
The supported nucleic acid codes
A → adenosine M → A C (amino) C → cytidine S → G C (strong) G → guanine W → A T (weak) T → thymidine B → G T C U → uridine D → G A T R → G A (purine) H → A C T Y → T C (pyrimidine) V → G C A K → G T (keto) N → A G C T (any) - → gap of indeterminate length
The supported codes (24 amino acids and 3 special codes)
A alanine P proline B aspartate or asparagine Q glutamine C cystine R arginine D aspartate S serine E glutamate T threonine F phenylalanine U selenocysteine G glycine V valine H histidine W tryptophan I isoleucine Y tyrosine K lysine Z glutamate or glutamine L leucine X any M methionine * translation stop N asparagine - gap of indeterminate length
Example
>gi|5524211|gb|AAD44166.1| cytochrome b LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX IENY