Matching BioHub

From BioUML platform
Jump to: navigation, search

Matching BioHub is a kind of BioHub which allows you to match a list of identifiers from one reference type to another (including if necessary cross-species matching). In BioUML code matching BioHub is a Java class which implements BioHub interface.


Technical details

There's a matching graph defined where each node is a combination of reference type and species and each edge is a matching procedure implemented by matching hub. Usually edges connect nodes of single species, but cross-species hubs are also possible and used in Data-Convert-table-via-homology-icon.png Convert table via homology analysis.

Node is defined by Properties object which has the following keys:

  • TYPE_PROPERTY (ReferenceType): stable name of the node reference type (example: 'EnsemblGeneTableType');
  • SPECIES_PROPERTY (Species): latin name of the node species (example: 'Homo sapiens').

Each edge is characterized by matching quality, which is a number between 0 and 1 (inclusive). Quality 1 means the best matching quality possible.

Each matching hub can define several matching edges. When you request a matching between given nodes using BioHubRegistry.getMatchingPath(Properties, Properties), it performs a Dijkstra search within matching graph looking for the path with minimal matching qualities product.

Debugging matching graph

You may debug matching graph using the biohub JavaScript host object in script viewpart. Use getReachableTypes method to retrieve all reference types reachable from given node. Use getMatchingPlan method to retrieve list of optimal matching steps between given nodes. The matchDebug method will provide a verbose output for matching procedure of given identifier between given nodes.

Implementation of SQL-based matching hub

The easiest way to implement your own matching hub is to prepare a special MySQL database and subclass SQLBasedHub.


The database schema is the following:

 `input` varchar(20) NOT NULL,
 `input_type` int NOT NULL,
 `output` varchar(20) NOT NULL,
 `output_type` int NOT NULL,
 `specie` int NOT NULL,
 KEY `input` (`input`,`specie`,`output_type`),
 KEY `output` (`output`,`specie`,`input_type`)
CREATE TABLE `hub_terms` (
 `id` int primary key,
 `term` varchar(100) not null,
 key `term` (`term`) 

So two tables called hub and hub_terms must present. Of course, you can use the same database for other purposes as well. The input_type, output_type and specie fields refer to the terms in hub_terms table via field. The input_type and output_type fields must refer to reference type stable name (usually the reference type class simple name; see ReferenceTypeSupport.getStableName()). The specie field must refer to the Latin name of the species. The input and output fields contains the identifier of the input type and the converted identifier of the output type respectively.


After creating the database you must subclass the SQLBasedHub class providing the SQLBasedHub.getMatchings() method implementation. This method must return an array of Matching objects which are constructed via the following parameters:

  • inputType: ReferenceType class for the input type.
  • outputType: ReferenceType class for the output type.
  • forward: if false, then matching will be performed in backwards direction, thus input id will be looked in hub.output field and output id will be returned from hub.input field.
  • quality: the quality of given matching (between 0 and 1).


The SQLBasedHub.getConnection() method must return a Connection to MySQL database. The default implementation works as follows:

  • If BioHub is registered within SQL module, then module default SQL connection is used.
  • If BioHub has properties jdbcURL, jdbcUser and jdbcPassword, then these properties are used to create a connection. See biohub extension point for details on how to specify BioHub properties.
  • Otherwise hub will be disabled.

If this algorithm doesn't satisfy you, you may override this method to create the Connection in a custom way.

See also

Personal tools

BioUML platform
Analysis & Workflows
Collaborative research
Virtual biology