Matching BioHub is a kind of BioHub which allows you to match a list of identifiers from one reference type to another (including if necessary cross-species matching). In BioUML code matching BioHub is a Java class which implements
There's a matching graph defined where each node is a combination of reference type and species and each edge is a matching procedure implemented by matching hub. Usually edges connect nodes of single species, but cross-species hubs are also possible and used in Convert table via homology analysis.
Node is defined by
Properties object which has the following keys:
TYPE_PROPERTY(ReferenceType): stable name of the node reference type (example: 'EnsemblGeneTableType');
SPECIES_PROPERTY(Species): latin name of the node species (example: 'Homo sapiens').
Each edge is characterized by matching quality, which is a number between 0 and 1 (inclusive). Quality 1 means the best matching quality possible.
Each matching hub can define several matching edges. When you request a matching between given nodes using
BioHubRegistry.getMatchingPath(Properties, Properties), it performs a Dijkstra search within matching graph looking for the path with minimal matching qualities product.
Debugging matching graph
getReachableTypes method to retrieve all reference types reachable from given node. Use
getMatchingPlan method to retrieve list of optimal matching steps between given nodes. The
matchDebug method will provide a verbose output for matching procedure of given identifier between given nodes.
Implementation of SQL-based matching hub
The easiest way to implement your own matching hub is to prepare a special MySQL database and subclass
The database schema is the following:
CREATE TABLE `hub` ( `input` varchar(20) NOT NULL, `input_type` int NOT NULL, `output` varchar(20) NOT NULL, `output_type` int NOT NULL, `specie` int NOT NULL, KEY `input` (`input`,`specie`,`output_type`), KEY `output` (`output`,`specie`,`input_type`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE `hub_terms` ( `id` int primary key, `term` varchar(100) not null, key `term` (`term`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
So two tables called hub and hub_terms must present. Of course, you can use the same database for other purposes as well. The input_type, output_type and specie fields refer to the terms in hub_terms table via hub_terms.id field. The input_type and output_type fields must refer to reference type stable name (usually the reference type class simple name; see
ReferenceTypeSupport.getStableName()). The specie field must refer to the Latin name of the species. The input and output fields contains the identifier of the input type and the converted identifier of the output type respectively.
After creating the database you must subclass the
SQLBasedHub class providing the
SQLBasedHub.getMatchings() method implementation. This method must return an array of
Matching objects which are constructed via the following parameters:
- inputType: ReferenceType class for the input type.
- outputType: ReferenceType class for the output type.
- forward: if false, then matching will be performed in backwards direction, thus input id will be looked in hub.output field and output id will be returned from hub.input field.
- quality: the quality of given matching (between 0 and 1).
SQLBasedHub.getConnection() method must return a
Connection to MySQL database. The default implementation works as follows:
- If BioHub is registered within SQL module, then module default SQL connection is used.
- If BioHub has properties jdbcURL, jdbcUser and jdbcPassword, then these properties are used to create a connection. See biohub extension point for details on how to specify BioHub properties.
- Otherwise hub will be disabled.
If this algorithm doesn't satisfy you, you may override this method to create the
Connection in a custom way.