Difference between revisions of "GTRD"

From BioUML platform
Jump to: navigation, search
(Repository structure)
(23 intermediate revisions by 4 users not shown)
Line 1: Line 1:
'''GTRD''' ('''Gene Transcription Regulation Database''') is a database of transcription factor binding sites identified from ChIP-seq experiments.
+
[[Category:Databases]]
GTRD analyzes freely avalable ChIP-seq experiments from literature, [http://www.ncbi.nlm.nih.gov/geo/ GEO], [http://www.ncbi.nlm.nih.gov/sra SRA] and [http://genome.ucsc.edu/ENCODE/ ENCODE] databases.
+
'''Gene Transcription Regulation Database''' ([http://gtrd.biouml.org GTRD]) is a database of transcription factor binding sites identified from ChIP-seq experiments that were systematically collected and uniformly processed using special workflow (pipeline) for BioUML platform. Raw ChIP-seq data and experiment information were collected from:
 +
* literature
 +
* [http://www.ncbi.nlm.nih.gov/geo/ GEO]
 +
* [http://www.ncbi.nlm.nih.gov/sra SRA]
 +
* [http://genome.ucsc.edu/ENCODE/ ENCODE].
  
The web interface to GTRD is available [http://www.cloud-biotech.com:8080/bioumlweb/#anonymous=true&perspective=GTRD&de=databases/GTRD/Data here].
+
Initial raw data were uniformly processed using specially developed workflow (pipeline) for BioUML platform:
 +
* sequenced reads were aligned to reference genome using Bowtie2;
 +
* peaks were identified using MACS, SISSR, GEM and PICS peak callers
 +
* peaks computed for the same TF and peak calling method, but different experiment conditions (e.g., cell line, treatment, etc.) were joined into clusters
 +
* clusters for the same TF revealed by different peak calling methods were joined into metaclusters
 +
 
 +
[[GTRD_Workflow|Learn more about GTRD build process]]
 +
 
 +
GTRD database is freely available for non-commercial organizations.
  
 
==Database statistics==
 
==Database statistics==
GTRD uses 2417 ChIP-seq experiments for 470 distinct sequence specific transcription factors. [[File:gtrd_exp_by_specie.png|thumb|ChIP-seq experiments by species]]
+
GTRD uses 5072 ChIP-seq experiments for 476 human and 257 mouse TFs that correspond to 542 TFClass classes. [[File:GTRD statistics-by-species2.png|thumb|ChIP-seq experiments by species]]
  
Most of ChIP-seq experiments (1638) have corresponding control experiment. [[File:gtrd_exp_control.png|thumb|Control experiments]]
+
Most of ChIP-seq experiments (61%) have corresponding control experiment. [[File:GTRD statistics-control2.png|thumb|Control experiments]]
  
 
General statistics:
 
General statistics:
Line 14: Line 26:
 
! Object type !! Total count !! Per ChIP-seq experiment
 
! Object type !! Total count !! Per ChIP-seq experiment
 
|-
 
|-
| ChIP-seq reads || 80.808E9 || 34.937E6
+
| ChIP-seq reads || 183.8 &times; 10<sup>9</sup> || 36.2 &times; 10<sup>6</sup>
 
|-
 
|-
| Reads aligned || 58.848E9 || 25.675E6
+
| Reads aligned || 146.9 &times; 10<sup>9</sup> || 28.9 &times; 10<sup>6</sup>
 
|-
 
|-
| ChIP-seq peaks || 59.515E6 || 32899
+
| ChIP-seq peaks || >100 &times; 10<sup>6</sup> || depends on peak caller
 
|}
 
|}
  
In average each transcription factor is measured in 4.07 ChIP-seq experiments, but 284 (60%) transcription factors measured only in one experiment.
+
In average each TF has been measured in 9.37 ChIP-seq experiments, 54% of TFs have been measured in more than one experiment.
  
 
The ten most studied transcription factors are listed bellow:
 
The ten most studied transcription factors are listed bellow:
Line 29: Line 41:
 
! Transcription Factor !! Number of ChIP-seq experiments
 
! Transcription Factor !! Number of ChIP-seq experiments
 
|-
 
|-
| CTCF || 195
+
| CTCF || 282
 
|-
 
|-
| c-Myc || 45
+
| AR || 117
 
|-
 
|-
| ERα || 44
+
| PU.1 || 103
 
|-
 
|-
| NRSF || 37
+
| ERα || 92
 
|-
 
|-
| C/EBPβ || 37
+
| c-Myc || 79
 
|-
 
|-
| GATA-1 || 33
+
| C/EBPβ || 74
 
|-
 
|-
| NF-κB p65 || 30
+
| NF-κB p65 || 70
 
|-
 
|-
| Max || 30
+
| GR || 53
 
|-
 
|-
| PU.1 || 29
+
| REST || 51
 
|-
 
|-
| GR || 24
+
| GATA-1 || 51
 
|}
 
|}
  
The detailed database statistics available in the [[GTRD#Web interface]]
+
[[GTRD statistics|Detailed statistics]].
  
 
==Database structure==
 
==Database structure==
Line 142: Line 154:
 
{{stub|screenshots}}
 
{{stub|screenshots}}
  
Web interface to GTRD is available [http://www.cloud-biotech.com:8080/bioumlweb/#anonymous=true&perspective=GTRD here].
+
Web interface to GTRD is available [http://gtrd.biouml.org/bioumlweb/gtrd.html here].
 
It provides capabilities for searching and browsing GTRD.
 
It provides capabilities for searching and browsing GTRD.
  
Line 163: Line 175:
  
 
The list of matching ChIP-seq experiments will appear in the 'Search result' tab.
 
The list of matching ChIP-seq experiments will appear in the 'Search result' tab.
Select one of them to view detailed ChIP-seq experiment information in the 'Info' tab.
+
Select one of them to view detailed ChIP-seq experiment information in the [[information box]].
The 'Info' tab also provides links to the experiment data: reads, alignments and peaks.
+
The information box also provides links to the experiment data: reads, alignments and peaks.
  
To view peaks or alignments click the link in the ChIP-seq experiment 'Info' tab, the {{Type link|track}} will be opened as table. The {{Type link|track}} can be exported by pressing 'Export' button or opened in the [[Genome browser]] by pressing 'Open as track' button in the [[General_control_panel]].
+
To view peaks or alignments click the link in the ChIP-seq experiment information box, the {{Type link|track}} will be opened as table. The {{Type link|track}} can be exported by pressing 'Export' button or opened in the [[genome browser]] by pressing 'Open as track' button in the [[general control panel]].
  
 
===Repository structure===
 
===Repository structure===
Line 177: Line 189:
 
*sequences - Raw ChIP-seq reads in fastq.gz format
 
*sequences - Raw ChIP-seq reads in fastq.gz format
 
*alignments - ChIP-seq read alignments
 
*alignments - ChIP-seq read alignments
*peaks - ChIP-seq peaks identified by MACS and SISSRS peak callers
+
*peaks - ChIP-seq peaks identified by MACS and SISSRs peak callers
 
*matrices - Position weight matrices for transcription factor binding sites
 
*matrices - Position weight matrices for transcription factor binding sites
 
*site models - Models for recognition of transcription factor binding sites
 
*site models - Models for recognition of transcription factor binding sites

Revision as of 16:02, 10 October 2016

Gene Transcription Regulation Database (GTRD) is a database of transcription factor binding sites identified from ChIP-seq experiments that were systematically collected and uniformly processed using special workflow (pipeline) for BioUML platform. Raw ChIP-seq data and experiment information were collected from:

Initial raw data were uniformly processed using specially developed workflow (pipeline) for BioUML platform:

  • sequenced reads were aligned to reference genome using Bowtie2;
  • peaks were identified using MACS, SISSR, GEM and PICS peak callers
  • peaks computed for the same TF and peak calling method, but different experiment conditions (e.g., cell line, treatment, etc.) were joined into clusters
  • clusters for the same TF revealed by different peak calling methods were joined into metaclusters

Learn more about GTRD build process

GTRD database is freely available for non-commercial organizations.

Contents

Database statistics

GTRD uses 5072 ChIP-seq experiments for 476 human and 257 mouse TFs that correspond to 542 TFClass classes.
ChIP-seq experiments by species
Most of ChIP-seq experiments (61%) have corresponding control experiment.
Control experiments

General statistics:

Object type Total count Per ChIP-seq experiment
ChIP-seq reads 183.8 × 109 36.2 × 106
Reads aligned 146.9 × 109 28.9 × 106
ChIP-seq peaks >100 × 106 depends on peak caller

In average each TF has been measured in 9.37 ChIP-seq experiments, 54% of TFs have been measured in more than one experiment.

The ten most studied transcription factors are listed bellow:

Transcription Factor Number of ChIP-seq experiments
CTCF 282
AR 117
PU.1 103
ERα 92
c-Myc 79
C/EBPβ 74
NF-κB p65 70
GR 53
REST 51
GATA-1 51

Detailed statistics.

Database structure

The metadata concerning GTRD is stored in MySQL tables.

Each ChIP-seq experiment has a row in 'chip_experiments' table, which assigns id and stores basic information about experiment. 'chip_experiments' table has following structure:

Column Description Example value
id Unique experiment identifier EXP000489
antibody Antibody used in chromatin immunoprecipitation sc-345
tfClassId Id in TFClass[1] database of target transcription factor, NULL for control experiments 6.2.1.0.1
cell_line Studied cell line HeLa S3
specie Species latin name Homo sapiens
treatment Cell treatment or conditions IFN gamma
control_id Id of control experiment, NULL for control experiments or experiments without control EXP000490

The links to external databases stored in 'external_refs' table:

Column Description Example values
id Experiment identifier EXP000489
external_db External database name GEO or PUBMED or ENCODE or SRA
external_db_id Identifier in external database GSM320736

GTRD uses following object identifiers:

Template Object type Example
EXPXXXXXX ChIP-seq experiment EXP000489
READSXXXXXX Collection of ChIP-seq reads READS000770
ALIGNSXXXXXX Collection of read alignments ALIGNS010001
PEAKSXXXXXX Collection of ChIP-seq peaks PEAKS010000

The relationship between these objects is provided by 'hub' table:

Column Description Example values
input Input object identifier READS000770
input_type Type of input object ReadsGTRDType
output Output object identifier EXP000489
output_type Type of output object ExperimentGTRDType

ChIP-seq reads, alignments and peaks links to experiments with hub table in the following way:

input input_type output output_type
READS000770 ReadsGTRDType EXP000489 ExperimentGTRDType
READS000771 ReadsGTRDType EXP000489 ExperimentGTRDType
READS000772 ReadsGTRDType EXP000489 ExperimentGTRDType
READS000773 ReadsGTRDType EXP000489 ExperimentGTRDType
READS000774 ReadsGTRDType EXP000489 ExperimentGTRDType
READS000775 ReadsGTRDType EXP000489 ExperimentGTRDType
ALIGNS010001 AlignmentsGTRDType EXP000489 ExperimentGTRDType
PEAKS010000 PeaksGTRDType EXP000489 ExperimentGTRDType

Web interface

This page or section is a stub. Please add screenshots here!

Web interface to GTRD is available here. It provides capabilities for searching and browsing GTRD.

Start page

Start page contains search box and the links to browse all experiments in the repository tree, explore databases statistics and transcription factor classification tree.

GTRD start page

Search capabilities

Search GTRD
Open ChIP-seq peaks

The ChIP-seq experiments contained in GTRD can be queried from the search box on the Start page. GTRD uses Lucene engine for indexing and quering ChIP-seq experiments that provides rich syntax for searching. The search can be performed by transcription factor(name or class), cell line, antibody or treatment/conditions.

For example, to search for STAT transcription factors enter stat* in the search field and press enter. You can restrict query to HeLa cells treated with interferon with following query:

 tfTitle:stat* AND cellLine:hela AND treatment:IFN

The list of matching ChIP-seq experiments will appear in the 'Search result' tab. Select one of them to view detailed ChIP-seq experiment information in the information box. The information box also provides links to the experiment data: reads, alignments and peaks.

To view peaks or alignments click the link in the ChIP-seq experiment information box, the Type-track-icon.png track will be opened as table. The Type-track-icon.png track can be exported by pressing 'Export' button or opened in the genome browser by pressing 'Open as track' button in the general control panel.

Repository structure

GTRD is organized in hierarchical Repository.

GTRD repository

The GTRD/Data folder contains following items:

  • experiments - ChIP-seq experiments metainformation
  • sequences - Raw ChIP-seq reads in fastq.gz format
  • alignments - ChIP-seq read alignments
  • peaks - ChIP-seq peaks identified by MACS and SISSRs peak callers
  • matrices - Position weight matrices for transcription factor binding sites
  • site models - Models for recognition of transcription factor binding sites
  • statistics - Summary GTRD statistics
  • views - TFClass centric view of GTRD data avalable as Type-tree-table-icon.png tree-tables

GTRD/Dictionaries/classification is TFClass classification tree used by GTRD to reference transcription factors.

Personal tools
Namespaces

Variants
Actions
BioUML platform
Community
Modelling
Analysis & Workflows
Collaborative research
Development
Virtual biology
Wiki
Toolbox