Matching BioHub

From BioUML platform
Revision as of 12:23, 3 July 2013 by Tagir Valeev (Talk | contribs)

Jump to: navigation, search

Matching BioHub is a kind of BioHub which allows you to match a list of identifiers from one reference type to another (including if necessary cross-species matching). In BioUML code matching BioHub is a Java class which implements BioHub interface.

Contents

Technical details

There's a matching graph defined where each node is a combination of reference type and species and each edge is a matching procedure implemented by matching hub. Usually edges connect nodes of single species, but cross-species hubs are also possible and used in Data-Convert-table-via-homology-icon.png Convert table via homology analysis.

Node is defined by Properties object which has the following keys:

  • TYPE_PROPERTY (ReferenceType): stable name of the node reference type (example: 'EnsemblGeneTableType');
  • SPECIES_PROPERTY (Species): latin name of the node species (example: 'Homo sapiens').

Each edge is characterized by matching quality, which is a number between 0 and 1 (inclusive). Quality 1 means the best matching quality possible.

Each matching hub can define several matching edges. When you request a matching between given nodes using BioHubRegistry.getMatchingPath(Properties, Properties), it performs a Dijkstra search within matching graph looking for the path with minimal matching qualities product.

Debugging matching graph

You may debug matching graph using the biohub JavaScript host object in script viewpart. Use getReachableTypes method to retrieve all reference types reachable from given node. Use getMatchingPlan method to retrieve list of optimal matching steps between given nodes. The matchDebug method will provide a verbose output for matching procedure of given identifier between given nodes.

Implementation of SQL-based matching hub

The easiest way to implement your own matching hub is to prepare a special MySQL database and subclass SQLBasedHub. The database schema is the following:

CREATE TABLE `hub` (
 `input` varchar(20) NOT NULL,
 `input_type` int NOT NULL,
 `output` varchar(20) NOT NULL,
 `output_type` int NOT NULL,
 `specie` int NOT NULL,
 KEY `input` (`input`,`specie`,`output_type`),
 KEY `output` (`output`,`specie`,`input_type`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE `hub_terms` (
 `id` int primary key,
 `term` varchar(100) not null,
 key `term` (`term`) 
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

So two tables called hub and hub_terms must present. Of course, you can use the same database for other purposes as well. The input_type, output_type and specie fields refer to the terms in hub_terms table via hub_terms.id field. The input_type and output_type fields must refer to reference type stable name (usually the reference type class simple name; see ReferenceTypeSupport.getStableName()). The specie field must refer to the Latin name of the species. The input and output fields contains the identifier of the input type and the converted identifier of the output type respectively.

After creating the database you must subclass the SQLBasedHub class providing the SQLBasedHub.getMatchings() method implementation. This method must return an array of Matching objects which are constructed via the following parameters:

  • inputType: ReferenceType class for the input type.
  • outputType: ReferenceType class for the output type.
  • forward: if false, then matching will be performed in backwards direction, thus input id will be looked in hub.output field and output id will be returned from hub.input field.
  • quality: the quality of given matching (between 0 and 1).

See also

Personal tools
Namespaces

Variants
Actions
BioUML platform
Community
Modelling
Analysis & Workflows
Collaborative research
Development
Virtual biology
Wiki
Toolbox