Gene Transcription Regulation Database (GTRD) is a database of transcription factor binding sites identified from ChIP-seq experiments that were systematically collected and uniformly processed using special workflow (pipeline) for BioUML platform. Raw ChIP-seq data and experiment information were collected from:
Initial raw data were uniformly processed using specially developed workflow (pipeline) for BioUML platform:
- sequenced reads were aligned to reference genome using Bowtie2;
- peaks were identified using MACS, SISSR, GEM and PICS peak callers
- peaks computed for the same TF and peak calling method, but different experiment conditions (e.g., cell line, treatment, etc.) were joined into clusters
- clusters for the same TF revealed by different peak calling methods were joined into metaclusters
GTRD database is freely available for non-commercial organizations.
Database statisticsGTRD uses 5072 ChIP-seq experiments for 476 human and 257 mouse TFs that correspond to 542 TFClass classes.
|Object type||Total count||Per ChIP-seq experiment|
|ChIP-seq reads||183.8 × 109||36.2 × 106|
|Reads aligned||146.9 × 109||28.9 × 106|
|ChIP-seq peaks||>100 × 106||depends on peak caller|
In average each TF has been measured in 9.37 ChIP-seq experiments, 54% of TFs have been measured in more than one experiment.
The ten most studied transcription factors are listed bellow:
|Transcription Factor||Number of ChIP-seq experiments|
The metadata concerning GTRD is stored in MySQL tables.
Each ChIP-seq experiment has a row in 'chip_experiments' table, which assigns id and stores basic information about experiment. 'chip_experiments' table has following structure:
|id||Unique experiment identifier||EXP000489|
|antibody||Antibody used in chromatin immunoprecipitation||sc-345|
|tfClassId||Id in TFClass database of target transcription factor, NULL for control experiments||126.96.36.199.1|
|cell_line||Studied cell line||HeLa S3|
|specie||Species latin name||Homo sapiens|
|treatment||Cell treatment or conditions||IFN gamma|
|control_id||Id of control experiment, NULL for control experiments or experiments without control||EXP000490|
The links to external databases stored in 'external_refs' table:
|external_db||External database name||GEO or PUBMED or ENCODE or SRA|
|external_db_id||Identifier in external database||GSM320736|
GTRD uses following object identifiers:
|READSXXXXXX||Collection of ChIP-seq reads||READS000770|
|ALIGNSXXXXXX||Collection of read alignments||ALIGNS010001|
|PEAKSXXXXXX||Collection of ChIP-seq peaks||PEAKS010000|
The relationship between these objects is provided by 'hub' table:
|input||Input object identifier||READS000770|
|input_type||Type of input object||ReadsGTRDType|
|output||Output object identifier||EXP000489|
|output_type||Type of output object||ExperimentGTRDType|
ChIP-seq reads, alignments and peaks links to experiments with hub table in the following way:
|This page or section is a stub. Please add screenshots here!|
Web interface to GTRD is available here. It provides capabilities for searching and browsing GTRD.
Start page contains search box and the links to browse all experiments in the repository tree, explore databases statistics and transcription factor classification tree.
The ChIP-seq experiments contained in GTRD can be queried from the search box on the Start page. GTRD uses Lucene engine for indexing and quering ChIP-seq experiments that provides rich syntax for searching. The search can be performed by transcription factor(name or class), cell line, antibody or treatment/conditions.
For example, to search for STAT transcription factors enter stat* in the search field and press enter. You can restrict query to HeLa cells treated with interferon with following query:
tfTitle:stat* AND cellLine:hela AND treatment:IFN
The list of matching ChIP-seq experiments will appear in the 'Search result' tab. Select one of them to view detailed ChIP-seq experiment information in the information box. The information box also provides links to the experiment data: reads, alignments and peaks.
To view peaks or alignments click the link in the ChIP-seq experiment information box, the track will be opened as table. The track can be exported by pressing 'Export' button or opened in the genome browser by pressing 'Open as track' button in the general control panel.
GTRD is organized in hierarchical Repository.
The GTRD/Data folder contains following items:
- experiments - ChIP-seq experiments metainformation
- sequences - Raw ChIP-seq reads in fastq.gz format
- alignments - ChIP-seq read alignments
- peaks - ChIP-seq peaks identified by MACS and SISSRs peak callers
- matrices - Position weight matrices for transcription factor binding sites
- site models - Models for recognition of transcription factor binding sites
- statistics - Summary GTRD statistics
- views - TFClass centric view of GTRD data avalable as tree-tables
GTRD/Dictionaries/classification is TFClass classification tree used by GTRD to reference transcription factors.