Difference between revisions of "Fasta format"

From BioUML platform
Jump to: navigation, search
m ({{Type link}} used)
(Automatic synchronization with BioUML)
 
Line 2: Line 2:
 
:Fasta format (*.fasta)
 
:Fasta format (*.fasta)
 
;Element type
 
;Element type
:{{Type link|track}}
+
:{{Type link|collection of sequences}}
 
;Plugin
 
;Plugin
 
:[[Ru.biosoft.bsa (plugin)|ru.biosoft.bsa (Bio-sequences analyses plugin)]]
 
:[[Ru.biosoft.bsa (plugin)|ru.biosoft.bsa (Bio-sequences analyses plugin)]]

Latest revision as of 11:20, 13 January 2014

File format title
Fasta format (*.fasta)
Element type
Type-collection-of-sequences-icon.png collection of sequences
Plugin
ru.biosoft.bsa (Bio-sequences analyses plugin)

Contents

[edit] FASTA format

A sequence in FASTA format begins with a single-line description, followed by lines of sequence data, which usually do not exceed 80 characters. The header line begins with ">" (greater-than) symbol and gives a name and comments for the sequence. Sequences are expected to be represented in the standard IUB/IUPAC amino acid and nucleic acid codes, with these exceptions: lower-case letters are accepted and are mapped into upper-case; a single hyphen or dash can be used to represent a gap of indeterminate length; and in amino acid sequences, U and * are acceptable letters. Numerical digits are not allowed.

[edit] The supported nucleic acid codes

A → adenosine             M → A C (amino)
C → cytidine              S → G C (strong)
G → guanine               W → A T (weak)
T → thymidine             B → G T C
U → uridine               D → G A T
R → G A (purine)          H → A C T
Y → T C (pyrimidine)      V → G C A
K → G T (keto)            N → A G C T (any)
- → gap of indeterminate length

[edit] The supported codes (24 amino acids and 3 special codes)

A  alanine                     P  proline
B  aspartate or asparagine     Q  glutamine
C  cystine                     R  arginine
D  aspartate                   S  serine
E  glutamate                   T  threonine
F  phenylalanine               U  selenocysteine
G  glycine                     V  valine
H  histidine                   W  tryptophan
I  isoleucine                  Y  tyrosine
K  lysine                      Z  glutamate or glutamine
L  leucine                     X  any
M  methionine                  *  translation stop
N  asparagine                  -  gap of indeterminate length

[edit] Example

>gi|5524211|gb|AAD44166.1| cytochrome b
LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV
EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG
LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL
GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX
IENY

[edit] References

  1. http://en.wikipedia.org/wiki/FASTA_format
Personal tools
Namespaces

Variants
Actions
BioUML platform
Community
Modelling
Analysis & Workflows
Collaborative research
Development
Virtual biology
Wiki
Toolbox