Convert table (analysis)

From BioUML platform
Jump to: navigation, search
Analysis title
Data-Convert-table-icon.png Convert table
Institute of Systems Biology
ru.biosoft.analysis (Common methods of data analysis plug-in)

Convert table identifiers using BioHub(s)

This analysis changes the type of identifiers in the table and converts rows accordingly using a chain of BioHubs. A BioHub is a converter between two or more types (for example, convert "Genes: Ensembl" into "Proteins: Ensembl"). If direct conversion between two given types is impossible, this analysis will create the optimal chain of several BioHubs and use them subsequently.

Note that several non-trivial situations might occur during conversion:

  • Single source ID matches to several target IDs. In this case source row will be copied several times, one copy per one target ID.
  • Source ID doesn't match to any target ID. In this case source row will be removed from result.
  • Several source ID's match to single target ID. In this case two options available:
    • You have specified main column. Of all suitable source rows only one will be selected to be put into result, based on specified aggregator. For example, if you specified 'maximum' as an aggregator, source row with maximal value in main column will be selected from suitable rows.
    • You have not specified main column. All the corresponding source rows will be merged together using merging rules. Non-trivial columns like 'Graph' will be removed from result. Text columns will have all values joined into sorted comma-separated list with duplicates removed. Numerical columns will be merged based on selected aggregator. For example, if you select 'average' as an aggregator, then mean value will appear in the result. If your source column have integral type, some aggregators may change it to float.


  • Input table – Data set to be converted
  • Column with IDs (expert) – Column to be used as source ID. Select (none) to use row IDs
  • Species – Select human, mouse or rat species
  • Input type – Type of references in input table
  • Output type – Select type of identifiers for the resulting table
  • Numerical value treatment rule – Select one of the rules to treat values in the numerical columns of the table when several rows are merged into a single one.
    In cases of "average", "average w/o 20% outliers" and "sum", the selected rule is applied to all numerical columns of the table. In cases of "minimum", "maximum" and "extreme" a new option appears bellow which request user to select a "Leading column". The chosen rule is applied then to the values in the selected Leading column (e.g. in the Leading column the maximum value is computed among all the merged rows). All other numerical values of the table will be taken from that row which corresponds to the selected value in the leading column.
  • Leading column – Select the column with numerical values to apply one of the rules described above
  • Restrict multiple matching to max (expert) – Input accession will be excluded from result if it matches to more accessions than specified in this parameter (set 0 to turn this feature off)
  • Unmatched rows (expert) – Path to store unmatched rows of the table
  • Output table – Path to store the resulting table in the tree
Personal tools

BioUML platform
Analysis & Workflows
Collaborative research
Virtual biology