|
|
(8 intermediate revisions by one user not shown) |
Line 2: |
Line 2: |
| :[[File:Molecular-networks-Effector-search-icon.png]] Effector search | | :[[File:Molecular-networks-Effector-search-icon.png]] Effector search |
| ;Provider | | ;Provider |
− | :[[Institute of Systems Biology]] | + | :[[geneXplain GmbH]] |
| + | ;Class |
| + | :{{Class|biouml.plugins.keynodes.EffectorKeyNodes}} |
| + | ;Plugin |
| + | :[[Biouml.plugins.keynodes (plugin)|biouml.plugins.keynodes (Master regulator node analysis plugin)]] |
| | | |
− | === Drug target search analysis === | + | ==== Description ==== |
− | | + | Find genes or proteins regulated by the input set of genes. |
− | '''Goal:''' to search for important molecules in signal transduction cascade.
| + | |
− | | + | |
− | '''Input:''' a set of genes / molecules to start analysis with. For instance, this can be a set of transcription factors, which may result from a promoter analysis, or a set of ligands / receptors that trigger a certain (set of) pathway(s).
| + | |
− | | + | |
− | Two separate analyses are available:
| + | |
− | | + | |
− | * '''Effector search''' for molecules downstream of the molecules in the input list.
| + | |
− | * '''Regulator search''' for molecules upstream of the molecules in the input list.
| + | |
− | | + | |
− | ==== Output: ====
| + | |
− | | + | |
− | A set of proteins or their encoding genes, which may play a key role in regulating (or being regulated by) a maximal number of start molecules.
| + | |
| | | |
| ==== Parameters: ==== | | ==== Parameters: ==== |
Line 28: |
Line 20: |
| * '''Score cutoff''' – Molecules with Score lower than specified will be excluded from the result | | * '''Score cutoff''' – Molecules with Score lower than specified will be excluded from the result |
| * '''Search collection''' – Collection containing reactions | | * '''Search collection''' – Collection containing reactions |
| + | * '''Custom search collection''' – Path to the custom search collection |
| + | * '''Relation sign''' – Consider only specified type of relation chain between molecules. |
| * '''Species''' – Species to which analysis should be confined | | * '''Species''' – Species to which analysis should be confined |
| * '''Calculate FDR''' – If true, analysis will calculate False Discovery Rate | | * '''Calculate FDR''' – If true, analysis will calculate False Discovery Rate |
Line 33: |
Line 27: |
| * '''Z-score cutoff''' – Molecules with Z-score lower than specified will be excluded from the result | | * '''Z-score cutoff''' – Molecules with Z-score lower than specified will be excluded from the result |
| * '''Penalty''' (expert) – Penalty value for false positives | | * '''Penalty''' (expert) – Penalty value for false positives |
− | * '''Context genes''' (expert) – Drug target search will be attracted towards context genes by decreasing the cost for close edges | + | * '''Decorators''' – Decorators |
− | * '''Context weighting column''' (expert) – Attraction strength
| + | |
− | * '''Decay factor''' (expert) – Decay factor to decrease attraction with distance increase
| + | |
| * '''Normalize multi-forms''' (expert) – Normalize weights of multiple forms | | * '''Normalize multi-forms''' (expert) – Normalize weights of multiple forms |
| * '''Output name''' – Output name. | | * '''Output name''' – Output name. |
− |
| |
− | ==== Algorithm description ====
| |
− |
| |
− | In drug target search analysis one searches for signaling molecules and corresponding networks that can transmit a signal to or receive a signal from several of input molecules within a certain limit of reaction steps. A search starts from each molecule of an input set ''V<sub>x</sub>'' and constructs the shortest paths to all nodes ''V'' of the complete network within a given maximal path cost ''R'' (i.e, the sum of the costs of all edges in the shortest path from a vertex in ''V<sub>x</sub>'' to a vertex in ''V'' should be smaller than or equal to ''R''). The search can be conducted in reverse direction of the edges leading to input molecules (upstream) or in the same direction (downstream).
| |
− |
| |
− | The '''Specificity score''' is calculated for every molecule found according to:
| |
− |
| |
− | :: [[File:Molecular-networks-Effector-search-score-equation.gif]]
| |
− |
| |
− | Where:
| |
− |
| |
− | * '''''R''''' — Max radius (input parameter)
| |
− | * '''''p''''' — Penalty (input parameter)
| |
− | * '''''N(X,r)''''' — total number of molecules reachable from key molecule X within the radius r.
| |
− | * '''''N<sub>max</sub>(r)''''' — maximal value of ''N(X,r)'' over all key molecules X found for this radius.
| |
− | * '''''M(X,r)''''' — sum of ''w(X)'' for all hits reachable from key molecule X within the radius r, where ''w(X)'' — weight of hit X. It equals to ''w<sub>b</sub>(X)'' if “Normalize multi-forms” is unchecked. Otherwise it’s ''w<sub>b</sub>(X)/I(X)'', where ''I(X)'' is the number of multiforms of X in the input set (not total number of multiforms in the database). In both cases ''w<sub>b</sub>(X)'' is the base weight of hit X. It equals the corresponding value in “Weighting column” or 1 if “Weighting column” is not specified.
| |
− | * '''''M<sub>max</sub>(r)''''' — maximal value of ''M(X,r)'' over all key molecules X found for this radius.
| |
− |
| |
− | '''FDR''' Each individual drug target molecule gets a ''p''-value (FDR) assigned, which represents the probability to occupy the observed rank or higher ranks by random chance. It is estimated on-the-fly by random sampling. The ranking of the key nodes is defined by sorting them according to the Score above in descending order. It should be noted that the rank is defined by the ranks of the occurring scores, which means that more than one key node can share the same score value in some cases. Molecules which do not have any hits get assigned the last rank since the score is zero in this case.
| |
− |
| |
− | '''''Z''-Score'''
| |
− |
| |
− | In addition to the FDR, each drug target molecule gets a ''Z''-Score
| |
− |
| |
− | :: [[File:Molecular-networks-Effector-search-z-score.gif]]
| |
− |
| |
− | which measures the deviation of the observed rank ''X'' of the key node from the expected rank ''μ'' in random case, divided by the standard deviation. In this formula, the rank above distribution is assumed to comply the normal distribution. Key nodes with ''Z'' greater than 1.0 are considered significant.
| |
− |
| |
− | Context algorithm
| |
− |
| |
− | For the purpose of incorporating additional contextual knowledge, e.g. a certain disease which we know to be related to the anticipated analysis, we implemented a method which encodes this additional context information as modified edge costs in the signaling network. The context information has to be provided as a second gene set (context genes). The idea is based on attracting the drug target molecule search (e.g. the underlying Dijkstra algorithm for shortest paths) towards context genes by decreasing the costs of those edges that are close to the context genes. It features two major aspects:
| |
− |
| |
− | # Attraction ("gravity") of the shortest-paths towards context genes C
| |
− | # Distribution of the attraction power to an extended surrounding area around C in order to prefer shortest paths close to context genes in case there is no path possible that goes through the context gene directly. ("gravity range").
| |
− |
| |
− | ==== Result columns ====
| |
− |
| |
− | * '''ID''' — key molecule identifier in respective database
| |
− | * '''Key molecule name''' — molecule title
| |
− | * '''Reached from set''' — number of molecules from input set, that were reached from key molecule within the distance given
| |
− | * '''Reachable total''' — total number of molecules, that can be reached from key molecule within the distance given
| |
− | * '''Score''' — specificity score value calculated as described above
| |
− | * '''FDR''' — ''p''-value, which represents the probability to occupy the observed rank or higher ranks by random chance
| |
− | * '''Z-Score''' — z-score, according the equation above
| |
− | * '''Hits''' — identifiers of molecules from input set, that were reached from key molecule within the distance given
| |
− | * '''Hits names''' — titles of molecules from input set, that were reached from key molecule within the distance given
| |
− |
| |
− | ==== References: ====
| |
− |
| |
− | # Kel, A., Voss, N., Jauregui, R., Kel-Margoulis, O. and Wingender, E.: Beyond microarrays: Find key transcription factors controlling signal transduction pathways BMC Bioinformatics 7(Suppl. 2), S13 (2006).
| |
| | | |
| [[Category:Analyses]] | | [[Category:Analyses]] |
| [[Category:Molecular networks (analyses group)]] | | [[Category:Molecular networks (analyses group)]] |
− | [[Category:ISB analyses]] | + | [[Category:GeneXplain analyses]] |
| [[Category:Autogenerated pages]] | | [[Category:Autogenerated pages]] |
Find genes or proteins regulated by the input set of genes.