Difference between revisions of "Fasta format"

Revision as of 16:28, 4 April 2013

A sequence in FASTA format begins with a single-line description, followed by lines of sequence data, which usually do not exceed 80 characters. The header line begins with ">" (greater-than) symbol and gives a name and comments for the sequence. Sequences are expected to be represented in the standard IUB/IUPAC amino acid and nucleic acid codes, with these exceptions: lower-case letters are accepted and are mapped into upper-case; a single hyphen or dash can be used to represent a gap of indeterminate length; and in amino acid sequences, U and * are acceptable letters. Numerical digits are not allowed.

The supported nucleic acid codes

A → adenosine             M → A C (amino)
C → cytidine              S → G C (strong)
G → guanine               W → A T (weak)
T → thymidine             B → G T C
U → uridine               D → G A T
R → G A (purine)          H → A C T
Y → T C (pyrimidine)      V → G C A
K → G T (keto)            N → A G C T (any)
- → gap of indeterminate length

The supported codes (24 amino acids and 3 special codes)

A  alanine                     P  proline
B  aspartate or asparagine     Q  glutamine
C  cystine                     R  arginine
D  aspartate                   S  serine
E  glutamate                   T  threonine
F  phenylalanine               U  selenocysteine
G  glycine                     V  valine
H  histidine                   W  tryptophan
I  isoleucine                  Y  tyrosine
K  lysine                      Z  glutamate or glutamine
L  leucine                     X  any
M  methionine                  *  translation stop
N  asparagine                  -  gap of indeterminate length

Example

>gi|5524211|gb|AAD44166.1| cytochrome b
LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV
EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG
LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL
GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX
IENY

References

http://en.wikipedia.org/wiki/FASTA_format

Revision as of 15:24, 4 April 2013 (view source) BioUML wiki Bot (Talk \| contribs) (Automatic synchronization with BioUML)		Revision as of 16:28, 4 April 2013 (view source) BioUML wiki Bot (Talk \| contribs) (Added 'Autogenerated pages' category) Newer edit →
Line 42:		Line 42:

	[[Category:File formats]]		[[Category:File formats]]
		+	[[Category:Autogenerated pages]]

Difference between revisions of "Fasta format"

Revision as of 16:28, 4 April 2013

Contents

FASTA format

The supported nucleic acid codes

The supported codes (24 amino acids and 3 special codes)

Example

References

Personal tools

Namespaces

Variants

Views

Actions

Search

BioUML platform

Community

Modelling

Analysis & Workflows

Collaborative research

Development

Virtual biology

Wiki

Toolbox