Difference between revisions of "Regulator search (analysis)"

Revision as of 15:24, 4 April 2013

Molecules collection – Input the collection of molecules/genes
Weighting column (expert) – Column to replace weights in search graph
Limit input size (expert) – Limit size of input list
Input size (expert) – Size of input list
Max radius – Maximal search radius
Score cutoff – Molecules with Score lower than specified will be excluded from the result
Search collection – Collection containing reactions
Species – Species to which analysis should be confined
Calculate FDR – If true, analysis will calculate False Discovery Rate
FDR cutoff – Molecules with FDR higher than specified will be excluded from the result
Z-score cutoff – Molecules with Z-score lower than specified will be excluded from the result
Penalty (expert) – Penalty value for false positives
Context genes (expert) – Drug target search will be attracted towards context genes by decreasing the cost for close edges
Context weighting column (expert) – Attraction strength
Decay factor (expert) – Decay factor to decrease attraction with distance increase
Normalize multi-forms (expert) – Normalize weights of multiple forms
Output name – Output name.

Algorithm description

In drug target search analysis one searches for signaling molecules and corresponding networks that can transmit a signal to or receive a signal from several of input molecules within a certain limit of reaction steps. A search starts from each molecule of an input set V_x and constructs the shortest paths to all nodes V of the complete network within a given maximal path cost R (i.e, the sum of the costs of all edges in the shortest path from a vertex in V_x to a vertex in V should be smaller than or equal to R). The search can be conducted in reverse direction of the edges leading to input molecules (upstream) or in the same direction (downstream).

The Specificity score is calculated for every molecule found according to:

Where:

R — Max radius (input parameter)
p — Penalty (input parameter)
N(X,r) — total number of molecules reachable from key molecule X within the radius r.
N_max(r) — maximal value of N(X,r) over all key molecules X found for this radius.
M(X,r) — sum of w(X) for all hits reachable from key molecule X within the radius r, where w(X) — weight of hit X. It equals to w_b(X) if “Normalize multi-forms” is unchecked. Otherwise it’s w_b(X)/I(X), where I(X) is the number of multiforms of X in the input set (not total number of multiforms in the database). In both cases w_b(X) is the base weight of hit X. It equals the corresponding value in “Weighting column” or 1 if “Weighting column” is not specified.
M_max(r) — maximal value of M(X,r) over all key molecules X found for this radius.

FDR Each individual drug target molecule gets a p-value (FDR) assigned, which represents the probability to occupy the observed rank or higher ranks by random chance. It is estimated on-the-fly by random sampling. The ranking of the key nodes is defined by sorting them according to the Score above in descending order. It should be noted that the rank is defined by the ranks of the occurring scores, which means that more than one key node can share the same score value in some cases. Molecules which do not have any hits get assigned the last rank since the score is zero in this case.

Z-Score In addition to the FDR, each drug target molecule gets a Z-Score

which measures the deviation of the observed rank X of the key node from the expected rank μ in random case, divided by the standard deviation. In this formula, the rank above distribution is assumed to comply the normal distribution. Key nodes with Z greater than 1.0 are considered significant.

Context algorithm

For the purpose of incorporating additional contextual knowledge, e.g. a certain disease which we know to be related to the anticipated analysis, we implemented a method which encodes this additional context information as modified edge costs in the signaling network. The context information has to be provided as a second gene set (context genes). The idea is based on attracting the drug target molecule search (e.g. the underlying Dijkstra algorithm for shortest paths) towards context genes by decreasing the costs of those edges that are close to the context genes. It features two major aspects:

Attraction ("gravity") of the shortest-paths towards context genes C
Distribution of the attraction power to an extended surrounding area around C in order to prefer shortest paths close to context genes in case there is no path possible that goes through the context gene directly. ("gravity range").

Result columns

ID — key molecule identifier in respective database
Key molecule name — molecule title
Reached from set — number of molecules from input set, that were reached from key molecule within the distance given
Reachable total — total number of molecules, that can be reached from key molecule within the distance given
Score — specificity score value calculated as described above
FDR — p-value, which represents the probability to occupy the observed rank or higher ranks by random chance
Z-Score — z-score, according the equation above
Hits — identifiers of molecules from input set, that were reached from key molecule within the distance given
Hits names — titles of molecules from input set, that were reached from key molecule within the distance given

References:

Kel, A., Voss, N., Jauregui, R., Kel-Margoulis, O. and Wingender, E.: Beyond microarrays: Find key transcription factors controlling signal transduction pathways BMC Bioinformatics 7(Suppl. 2), S13 (2006).

Difference between revisions of "Regulator search (analysis)"

Revision as of 15:24, 4 April 2013

Contents

Drug target search analysis

Output:

Parameters:

Algorithm description

Result columns

References:

Personal tools

Namespaces

Variants

Views

Actions

Search

BioUML platform

Community

Modelling

Analysis & Workflows

Collaborative research

Development

Virtual biology

Wiki

Toolbox