Difference between revisions of "Matching BioHub"
Tagir Valeev (Talk | contribs) (Created page with "{{Stub}} '''Matching BioHub''' is a kind of BioHub which allows you to match a list of identifiers from one reference type to another...") |
Tagir Valeev (Talk | contribs) (Technical details and implementation via SqlBasedHub added) |
||
Line 1: | Line 1: | ||
− | |||
− | |||
'''Matching BioHub''' is a kind of [[BioHub]] which allows you to match a list of identifiers from one [[Reference type (extension point)|reference type]] to another (including if necessary cross-species matching). In [[BioUML]] code matching BioHub is a Java class which implements {{Class|ru.biosoft.access.biohub.BioHub}} interface. | '''Matching BioHub''' is a kind of [[BioHub]] which allows you to match a list of identifiers from one [[Reference type (extension point)|reference type]] to another (including if necessary cross-species matching). In [[BioUML]] code matching BioHub is a Java class which implements {{Class|ru.biosoft.access.biohub.BioHub}} interface. | ||
+ | |||
+ | === Technical details === | ||
+ | |||
+ | There's a '''matching graph''' defined where each node is a combination of [[reference type]] and species and each edge is a matching procedure implemented by matching hub. Usually edges connect nodes of single species, but cross-species hubs are also possible and used in [[File:Data-Convert-table-via-homology-icon.png]] [[Convert table via homology (analysis)|Convert table via homology]] analysis. | ||
+ | |||
+ | Node is defined by {{Class|java.util.Properties}} object which has the following keys: | ||
+ | * {{Constant|ru.biosoft.access.biohub.BioHub.TYPE_PROPERTY}} ('''ReferenceType'''): stable name of the node reference type (example: 'EnsemblGeneTableType'); | ||
+ | * {{Constant|ru.biosoft.access.biohub.BioHub.SPECIES_PROPERTY}} ('''Species'''): latin name of the node species (example: 'Homo sapiens'). | ||
+ | |||
+ | Each edge is characterized by '''matching quality''', which is a number between 0 and 1 (inclusive). Quality 1 means the best matching quality possible. | ||
+ | |||
+ | Each matching hub can define several matching edges. When you request a matching between given nodes using {{Method|ru.biosoft.access.biohub.BioHubRegistry.getMatchingPath(Properties, Properties)}}, it performs a Dijkstra search within matching graph looking for the path with minimal matching qualities product. | ||
+ | |||
+ | === Debugging matching graph === | ||
+ | You may debug matching graph using the [[Biohub (host object)|biohub]] JavaScript host object in [[script viewpart]]. Use <code>getReachableTypes</code> method to retrieve all [[reference type]]s reachable from given node. Use <code>getMatchingPlan</code> method to retrieve list of optimal matching steps between given nodes. The <code>matchDebug</code> method will provide a verbose output for matching procedure of given identifier between given nodes. | ||
+ | |||
+ | === Implementation of SQL-based matching hub === | ||
+ | The easiest way to implement your own matching hub is to prepare a special MySQL database and subclass {{Class|ru.biosoft.access.biohub.SQLBasedHub}}. The database schema is the following: | ||
+ | |||
+ | CREATE TABLE `hub` ( | ||
+ | `input` varchar(20) NOT NULL, | ||
+ | `input_type` int NOT NULL, | ||
+ | `output` varchar(20) NOT NULL, | ||
+ | `output_type` int NOT NULL, | ||
+ | `specie` int NOT NULL, | ||
+ | KEY `input` (`input`,`specie`,`output_type`), | ||
+ | KEY `output` (`output`,`specie`,`input_type`) | ||
+ | ) ENGINE=MyISAM DEFAULT CHARSET=latin1; | ||
+ | |||
+ | CREATE TABLE `hub_terms` ( | ||
+ | `id` int primary key, | ||
+ | `term` varchar(100) not null, | ||
+ | key `term` (`term`) | ||
+ | ) ENGINE=MyISAM DEFAULT CHARSET=latin1; | ||
+ | |||
+ | So two tables called '''hub''' and '''hub_terms''' must present. Of course, you can use the same database for other purposes as well. The '''input_type''', '''output_type''' and '''specie''' fields refer to the terms in '''hub_terms''' table via '''hub_terms.id''' field. The '''input_type''' and '''output_type''' fields must refer to [[reference type]] stable name (usually the reference type class simple name; see {{Method|ru.biosoft.access.biohub.ReferenceTypeSupport.getStableName()}}). The '''specie''' field must refer to the Latin name of the species. The '''input''' and '''output''' fields contains the identifier of the input type and the converted identifier of the output type respectively. | ||
+ | |||
+ | After creating the database you must subclass the {{Class|ru.biosoft.access.biohub.SQLBasedHub}} class providing the {{Method|ru.biosoft.access.biohub.SQLBasedHub.getMatchings()}} method implementation. This method must return an array of {{Class|ru.biosoft.access.biohub.SQLBasedHub.Matching}} objects which are constructed via the following parameters: | ||
+ | * '''inputType''': ReferenceType class for the input type. | ||
+ | * '''outputType''': ReferenceType class for the output type. | ||
+ | * '''forward''': if false, then matching will be performed in backwards direction, thus input id will be looked in '''hub.output''' field and output id will be returned from '''hub.input''' field. | ||
+ | * '''quality''': the quality of given matching (between 0 and 1). | ||
+ | |||
+ | === See also === | ||
+ | *[[Biohub (host object)]] | ||
[[Category:Development]] | [[Category:Development]] | ||
[[Category:BioHub]] | [[Category:BioHub]] |
Revision as of 12:23, 3 July 2013
Matching BioHub is a kind of BioHub which allows you to match a list of identifiers from one reference type to another (including if necessary cross-species matching). In BioUML code matching BioHub is a Java class which implements BioHub
interface.
Contents |
Technical details
There's a matching graph defined where each node is a combination of reference type and species and each edge is a matching procedure implemented by matching hub. Usually edges connect nodes of single species, but cross-species hubs are also possible and used in Convert table via homology analysis.
Node is defined by Properties
object which has the following keys:
-
TYPE_PROPERTY
(ReferenceType): stable name of the node reference type (example: 'EnsemblGeneTableType'); -
SPECIES_PROPERTY
(Species): latin name of the node species (example: 'Homo sapiens').
Each edge is characterized by matching quality, which is a number between 0 and 1 (inclusive). Quality 1 means the best matching quality possible.
Each matching hub can define several matching edges. When you request a matching between given nodes using BioHubRegistry.getMatchingPath(Properties, Properties)
, it performs a Dijkstra search within matching graph looking for the path with minimal matching qualities product.
Debugging matching graph
You may debug matching graph using the biohub JavaScript host object in script viewpart. Use getReachableTypes
method to retrieve all reference types reachable from given node. Use getMatchingPlan
method to retrieve list of optimal matching steps between given nodes. The matchDebug
method will provide a verbose output for matching procedure of given identifier between given nodes.
Implementation of SQL-based matching hub
The easiest way to implement your own matching hub is to prepare a special MySQL database and subclass SQLBasedHub
. The database schema is the following:
CREATE TABLE `hub` ( `input` varchar(20) NOT NULL, `input_type` int NOT NULL, `output` varchar(20) NOT NULL, `output_type` int NOT NULL, `specie` int NOT NULL, KEY `input` (`input`,`specie`,`output_type`), KEY `output` (`output`,`specie`,`input_type`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE `hub_terms` ( `id` int primary key, `term` varchar(100) not null, key `term` (`term`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
So two tables called hub and hub_terms must present. Of course, you can use the same database for other purposes as well. The input_type, output_type and specie fields refer to the terms in hub_terms table via hub_terms.id field. The input_type and output_type fields must refer to reference type stable name (usually the reference type class simple name; see ReferenceTypeSupport.getStableName()
). The specie field must refer to the Latin name of the species. The input and output fields contains the identifier of the input type and the converted identifier of the output type respectively.
After creating the database you must subclass the SQLBasedHub
class providing the SQLBasedHub.getMatchings()
method implementation. This method must return an array of Matching
objects which are constructed via the following parameters:
- inputType: ReferenceType class for the input type.
- outputType: ReferenceType class for the output type.
- forward: if false, then matching will be performed in backwards direction, thus input id will be looked in hub.output field and output id will be returned from hub.input field.
- quality: the quality of given matching (between 0 and 1).