Fasta format

From BioUML platform
Jump to: navigation, search
File format title
Fasta format (*.fasta)
Element type
Type-collection-of-sequences-icon.png collection of sequences
Plugin
ru.biosoft.bsa (Bio-sequences analyses plugin)

Contents

FASTA format

A sequence in FASTA format begins with a single-line description, followed by lines of sequence data, which usually do not exceed 80 characters. The header line begins with ">" (greater-than) symbol and gives a name and comments for the sequence. Sequences are expected to be represented in the standard IUB/IUPAC amino acid and nucleic acid codes, with these exceptions: lower-case letters are accepted and are mapped into upper-case; a single hyphen or dash can be used to represent a gap of indeterminate length; and in amino acid sequences, U and * are acceptable letters. Numerical digits are not allowed.

The supported nucleic acid codes

A → adenosine             M → A C (amino)
C → cytidine              S → G C (strong)
G → guanine               W → A T (weak)
T → thymidine             B → G T C
U → uridine               D → G A T
R → G A (purine)          H → A C T
Y → T C (pyrimidine)      V → G C A
K → G T (keto)            N → A G C T (any)
- → gap of indeterminate length

The supported codes (24 amino acids and 3 special codes)

A  alanine                     P  proline
B  aspartate or asparagine     Q  glutamine
C  cystine                     R  arginine
D  aspartate                   S  serine
E  glutamate                   T  threonine
F  phenylalanine               U  selenocysteine
G  glycine                     V  valine
H  histidine                   W  tryptophan
I  isoleucine                  Y  tyrosine
K  lysine                      Z  glutamate or glutamine
L  leucine                     X  any
M  methionine                  *  translation stop
N  asparagine                  -  gap of indeterminate length

Example

>gi|5524211|gb|AAD44166.1| cytochrome b
LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV
EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG
LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL
GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX
IENY

References

  1. http://en.wikipedia.org/wiki/FASTA_format
Personal tools
Namespaces

Variants
Actions
BioUML platform
Community
Modelling
Analysis & Workflows
Collaborative research
Development
Virtual biology
Wiki
Toolbox