Difference between revisions of "SIF (file format)"

From BioUML platform
Jump to: navigation, search
(Added 'Autogenerated pages' category)
m (Protected "SIF (file format)": Autogenerated page (‎[edit=sysop] (indefinite)))

Revision as of 16:28, 4 April 2013

SIF format

The Simple Interaction Format (SIF) was originally created for use with Cytoscape1,2, the open source bioinformatics software platform for visualizing molecular interaction networks.

SIF is convenient for building a graph from a list of interactions and makes it easy to combine different interaction sets into a larger network, or add new interactions to an existing data set. The main disadvantage is that this format does not include any layout information, forcing Cytoscape to re-compute a new layout of the network each time it is loaded.

Lines in the SIF file specify a source node, a relationship type (or edge type), and one or more target nodes:

nodeA   nodeB
nodeC   nodeA
nodeD   nodeE nodeF nodeB
nodeG
...
nodeY   nodeZ

A more specific example is:

node1 typeA node2
node2 typeB node3 node4 node5
node0

The first line identifies two nodes, called node1 and node2, and a single relationship between node1 and node2 of type typeA. The second line specifies three new nodes, node3, node4, and node5; here "node2" refers to the same node as in the first line. The second line also specifies three relationships, all of type typeB and with node2 as the source, with node3, node4, and node5 as the targets. This second form is simply shorthand for specifying multiple relationships of the same type with the same source node. The third line indicates how to specify a node that has no relationships with other nodes. This form is not needed for nodes that do have relationships, since the specification of the relationship implicitly identifies the nodes as well.

Duplicate entries are ignored. Multiple edges between the same nodes must have different edge types. For example, the following specifies two edges between the same pair of nodes, one of type xx and one of type yy:

node1 xx node2
node1 xx node2
node1 yy node2

Edges connecting a node to itself (self-edges) are also allowed:

node1 xx node1

Every node and edge in Cytoscape has an identifying name, most commonly used with the node and edge data attribute structures. Node names must be unique, as identically named nodes will be treated as identical nodes. The name of each node will be the name in this file by default (unless another string is mapped to display on the node using the visual mapper). This is discussed in the section on visual styles. The name of each edge will be formed from the name of the source and target nodes plus the interaction type: for example, sourceName (edgeType) targetName.

The tag can be any string. Whole words or concatenated words may be used to define types of relationships, e.g. geneFusion, cogInference, pullsDown, activates, degrades, inactivates, inhibits, phosphorylates, upRegulates, etc.

Some common interaction types used in the Systems Biology community are as follows:

pp .................. protein – protein interaction
pd .................. protein -> DNA   
(e.g. transcription factor binding upstream of a regulating gene.)

Some less common interaction types used are:

pr .................. protein -> reaction
rc .................. reaction -> compound
cr .................. compound -> reaction
gl .................. genetic lethal relationship
pm .................. protein-metabolite interaction
mp .................. metabolite-protein interaction

Delimiters

Whitespace (space or tab) is used to delimit the names in the simple interaction file format. However, in some cases spaces are desired in a node name or edge type. The standard is that, if the file contains any tab characters, then tabs are used to delimit the fields and spaces are considered part of the name. If the file contains no tabs, then any spaces are delimiters that separate names (and names cannot contain spaces).

If your network unexpectedly contains no edges and node names that look like edge names, it probably means your file contains a stray tab that's fooling the parser. On the other hand, if your network has nodes whose names are half of a full name, then you probably meant to use tabs to separate node names with spaces.

Networks in simple interactions format are often stored in files with a .sif extension, and Cytoscape recognizes this extension when browsing a directory for files of this type.

References

  1. http://www.cytoscape.org
  2. http://www.cytoscape.org/manual/Cytoscape2_5Manual.html#SIF%20Format
Personal tools
Namespaces

Variants
Actions
BioUML platform
Community
Modelling
Analysis & Workflows
Collaborative research
Development
Virtual biology
Wiki
Toolbox