Difference between revisions of "Regulator search (analysis)"

From BioUML platform
Jump to: navigation, search
(Automatic synchronization with BioUML)

Revision as of 15:24, 4 April 2013

Contents

Drug target search analysis

Goal: to search for important molecules in signal transduction cascade.

Input: a set of genes / molecules to start analysis with. For instance, this can be a set of transcription factors, which may result from a promoter analysis, or a set of ligands / receptors that trigger a certain (set of) pathway(s).

Two separate analyses are available:

  • Effector search for molecules downstream of the molecules in the input list.
  • Regulator search for molecules upstream of the molecules in the input list.

Output:

A set of proteins or their encoding genes, which may play a key role in regulating (or being regulated by) a maximal number of start molecules.

Parameters:

  • Molecules collection – Input the collection of molecules/genes
  • Weighting column (expert) – Column to replace weights in search graph
  • Limit input size (expert) – Limit size of input list
  • Input size (expert) – Size of input list
  • Max radius – Maximal search radius
  • Score cutoff – Molecules with Score lower than specified will be excluded from the result
  • Search collection – Collection containing reactions
  • Species – Species to which analysis should be confined
  • Calculate FDR – If true, analysis will calculate False Discovery Rate
  • FDR cutoff – Molecules with FDR higher than specified will be excluded from the result
  • Z-score cutoff – Molecules with Z-score lower than specified will be excluded from the result
  • Penalty (expert) – Penalty value for false positives
  • Context genes (expert) – Drug target search will be attracted towards context genes by decreasing the cost for close edges
  • Context weighting column (expert) – Attraction strength
  • Decay factor (expert) – Decay factor to decrease attraction with distance increase
  • Normalize multi-forms (expert) – Normalize weights of multiple forms
  • Output name – Output name.

Algorithm description

In drug target search analysis one searches for signaling molecules and corresponding networks that can transmit a signal to or receive a signal from several of input molecules within a certain limit of reaction steps. A search starts from each molecule of an input set Vx and constructs the shortest paths to all nodes V of the complete network within a given maximal path cost R (i.e, the sum of the costs of all edges in the shortest path from a vertex in Vx to a vertex in V should be smaller than or equal to R). The search can be conducted in reverse direction of the edges leading to input molecules (upstream) or in the same direction (downstream).

The Specificity score is calculated for every molecule found according to:

Molecular-networks-Regulator-search-score-equation.gif

Where:

  • R — Max radius (input parameter)
  • p — Penalty (input parameter)
  • N(X,r) — total number of molecules reachable from key molecule X within the radius r.
  • Nmax(r) — maximal value of N(X,r) over all key molecules X found for this radius.
  • M(X,r) — sum of w(X) for all hits reachable from key molecule X within the radius r, where w(X) — weight of hit X. It equals to wb(X) if “Normalize multi-forms” is unchecked. Otherwise it’s wb(X)/I(X), where I(X) is the number of multiforms of X in the input set (not total number of multiforms in the database). In both cases wb(X) is the base weight of hit X. It equals the corresponding value in “Weighting column” or 1 if “Weighting column” is not specified.
  • Mmax(r) — maximal value of M(X,r) over all key molecules X found for this radius.

FDR Each individual drug target molecule gets a p-value (FDR) assigned, which represents the probability to occupy the observed rank or higher ranks by random chance. It is estimated on-the-fly by random sampling. The ranking of the key nodes is defined by sorting them according to the Score above in descending order. It should be noted that the rank is defined by the ranks of the occurring scores, which means that more than one key node can share the same score value in some cases. Molecules which do not have any hits get assigned the last rank since the score is zero in this case.

Z-Score In addition to the FDR, each drug target molecule gets a Z-Score

Molecular-networks-Regulator-search-z-score.gif

which measures the deviation of the observed rank X of the key node from the expected rank μ in random case, divided by the standard deviation. In this formula, the rank above distribution is assumed to comply the normal distribution. Key nodes with Z greater than 1.0 are considered significant.

Context algorithm

For the purpose of incorporating additional contextual knowledge, e.g. a certain disease which we know to be related to the anticipated analysis, we implemented a method which encodes this additional context information as modified edge costs in the signaling network. The context information has to be provided as a second gene set (context genes). The idea is based on attracting the drug target molecule search (e.g. the underlying Dijkstra algorithm for shortest paths) towards context genes by decreasing the costs of those edges that are close to the context genes. It features two major aspects:

  1. Attraction ("gravity") of the shortest-paths towards context genes C
  2. Distribution of the attraction power to an extended surrounding area around C in order to prefer shortest paths close to context genes in case there is no path possible that goes through the context gene directly. ("gravity range").

Result columns

  • ID — key molecule identifier in respective database
  • Key molecule name — molecule title
  • Reached from set — number of molecules from input set, that were reached from key molecule within the distance given
  • Reachable total — total number of molecules, that can be reached from key molecule within the distance given
  • Score — specificity score value calculated as described above
  • FDRp-value, which represents the probability to occupy the observed rank or higher ranks by random chance
  • Z-Score — z-score, according the equation above
  • Hits — identifiers of molecules from input set, that were reached from key molecule within the distance given
  • Hits names — titles of molecules from input set, that were reached from key molecule within the distance given

References:

  1. Kel, A., Voss, N., Jauregui, R., Kel-Margoulis, O. and Wingender, E.: Beyond microarrays: Find key transcription factors controlling signal transduction pathways BMC Bioinformatics 7(Suppl. 2), S13 (2006).
Personal tools
Namespaces

Variants
Actions
BioUML platform
Community
Modelling
Analysis & Workflows
Collaborative research
Development
Virtual biology
Wiki
Toolbox