Data Integrator Tool Suite

Introduction

Welcome to the Center of Biomedicine (CBM) at EURAC research. This is the home page of the Data Integration (Dintor) tool suite with more than thirty modules ready for use by bioinformaticians and biologists working in genomics research.

Data emerging from genome wide association (GWA) studies and next generation sequencing (NGS) technologies provide a wealth of information ready to be used by the scientific community. These large data sets form the basis for further analysis based on the individual researcher's focus. Large scale processing for non-bioinformaticians, however, is hampered by the way these data sets are stored. For example, finding the closest protein coding gene next to a location encoded by a dbSNP identifier from a GWA result table may be done for a few SNPs of interest using the genome browsers, but the task becomes arduous once more than a handful of such entries have to be queried.

We therefore have developed Dintor, a suite of tools that facilitate working with GWA and NGS data. Beyond this goal, the framework offers modules for high level functional annotation of genes and gene products such as gene set prioritization, functional similarity of proteins, or clinical significance of variation data. Each of these tools has been designed to perform a basic task independently. The real power of the tool suite shows to advantage once these tools are combined to form a pipeline in order to accomplish a complex analysis.

The hallmarks of our approach are:

Tutorials

On the tutorial page, we have collected several small examples that demonstrate usage of this web server.

Documentation

When opening a Dintor tool from the left pane ("CBM/Dintor"), each of the tool options comes with a small help text next to it. Tools share several standard options, which are described as follows:

At the end of each tool, in the section "Full reference" there is a link to an HTML page with extensive online documentation, describing in more detail the method, available options, and output and providing small examples.

Human Genome Data

The Dintor framework has been developed during the time when human genome release 37 (GRCh37) was used as a reference genome. In the meanwhile, release 38 (GRCh38) is available, and the framework takes this into account. All Dintor releases prior to 2017-05 (2015-04, 2014-12, ...) work on GRCh37 only, as during that time this was the predominantly used reference genome.

Dintor release 2017-05 is a special release that combines both human reference genomes, GRCh37 and GRCh38. Data version v-00 is based on Ensembl version 75, which was the last release that worked with GRCh37 data. Data versions v-01 and v-02 use GRCh38 as a reference and are based on data from Ensembl versions 84 and 88, respectively. This allows users to analyze data for both human genome reference datasets in a single Dintor release. Subsequent releases will incorporate GRCh38 data only. However, tools can be run with any Dintor release and therefore GRCh37 data will remain available in the future.

Download

You may want to use the Dintor suite on your local system for several reasons:

The Dintor framework is packaged in three logically separated files for download:

  1. Source code, dintor-src.tgz - Contains all releases of the Dintor tools suite. Please see the INSTALL file in the latest release after unpacking this tarball.
  2. Data, dintor-data.tgz - Contains all versions of data files used by the tools. This file is well over 500MB in size, be prepared for longer download waiting time.
  3. Ensembl API, ensembl-api.tgz - Contains all necessary Ensembl APIs needed for the Perl modules to access Ensembl data.
We keep record of release information and associated data files used by the tools. This information is available in the source distribution as an OpenOffice/LibreOffice file, but can also be viewed directly.
Galaxy is an open, web-based platform for data intensive biomedical research. The Galaxy team is a part of BX at Penn State, and the Biology and Mathematics and Computer Science departments at Emory University. The Galaxy Project is supported in part by NHGRI, NSF, The Huck Institutes of the Life Sciences, The Institute for CyberScience at Penn State, and Emory University.