Data Integrator
SNPGeneGlobalLDChecker - Linkage disequilibrium block membership for genomic coordinates

This program answers whether some base pair on a chromosome is in the same linkage disequilibrium (LD) block as a gene. Definition and computation of LD blocks is internal to EURAC. Generally, an LD block defines a contiguous stretch of DNA on a chromosome, in which high LD has been detected. This is a much more simplified, but also much faster approach than computing the commonly known LD between a pair of dbSNP identifiers, as implemented in a separate tool, gcoords2ld - Linkage disequilibrium for genomic coordinate/gene pairs.

Input

The tool expects input parameters for a genomic coordinate and a gene. The gene, unlike in the sister tool gcoords2ld - Linkage disequilibrium for genomic coordinate/gene pairs, must be given by its boundaries. These can readily be derived by using the tool gcoords2genes - Genes in the vicinity of a genomic coordinate. So, the input consists of a triplet of column indices, the first always codes for the genomic coordinate. Depending on the chosen option, the next two indices point to columns which contain genomic coordinate entries, or they contain signed distances relative to the position given by the first genomic coordinate, a positive number increases the coordinate on the forward strand, and a negative number decreases the coordinate on the forward strand (ie. would be an increase on the reverse strand). This is fully compliant with the output provided by gcoords2genes - Genes in the vicinity of a genomic coordinate.

Furthermore, since two different datasets were used to compute LD blocks, these can be chosen as an input parameter. Historically, LD blocks were computed on HapMap2 data (which are NCBI36 coordinates and have been mapped to GRCh37), and on the more recent 1000 Genomes dataset. For both datasets, the CEU (Utah residents with ancestry from northern and western Europe) population only was used to derive LD block data.

Options applicable to more than a single tool are summarized in common command line options.

Output

Depending on the dataset LD blocks were derived from, a single column In LD (HapMap) or In LD (1000 Genomes) is added to the table. It contains the chromosome number and the block ID from the underlying LD data definition file if the coordinate/gene pair is in LD according to LD blocks database, otherwise it is the empty cell.

The following diagram should help clarify when a coordinate/gene pair is in LD. The general rule is that the coordinate must be in the same LD block as (parts) of the gene it is queried for.

5'                                                                            3'
--------------------------------------------------------------------------------
   <<<<<<<<< gene A <<<<<<<<<<         >>>> gene B >>>>>>    >>> gene C >>>
 ***************** LD block 1 **************               *** LD block 2 ******
              X                                Y                              Z

LD for the given scenario is summarized below.

Coordinate Gene LD? Comment
X A 1 (X&A in LD block 1)
X B 1 (X& parts of B in block 1)
X C 0 (No common block for X&C)
Y A 0 (Outside of any LD block)
Y B 0 (Outside of any LD block)
Y C 0 (Outside of any LD block)
Z A 0 (No common block for Z&A)
Z B 0 (No common block for Z&B)
Z C 2 (Z&C in LD block 2)