Data Integrator
Pos2LDBlock - Assignment of closest LD block for a location.

Find the corresponding LD block for a given position on a chromosome. The LD blocks are computed based on a highly optmized version [1] of the original publication [2].

Input

The input to this tool is very simple, it contains a column with genomic coordinates. Options applicable to more than a single tool are summarized in common command line options.

Output

For the genomic coordinate in the input column the closest LD is looked up. Four columns specify the LD block and its relationship to the genomic coordinate :

  • LD block The identifier of the LD block. It consists of the chromosome number and an integer, and the same identifier is used in its Ensembl browser track.
  • LD covers Is set to 1 if the location is contained in the LD block and it is set to 0 if not.
  • LD begin Specifies the LD block's begin genomic coordinate.
  • LD end Specifies the LD block's end genomic coordinate.

An empty cell is output, if the input is not a valid genomic coordinate.

For example, if the input file is stored in /tmp/ld.tsv as

GC                  Comment
GRCh37:5:125707200  Block 5:842, SNP inside LD block, LD block is inside gene GRAMD3
GRCh37:5:125699800  Left of block 5:842, no LD block there, SNP is still inside GRAMD3
GRCh37:5:125680000  Block 5:1909, left of GRAMD3, SNP inside LD block, SNP outside gene, block covers gene
GRCh37:5:125678600  Block 5:4193, left of GRAMD3, SNP in LD block, LD block does not cover gene, GRAMD3 is nearest gene
GRCh37:5:125676400  SNP is left of block 5:3561, block does not cover gene, nearest gene is GRAMD3
GRCh37:5:132487144  This is located exactly in between genes HSPA4 and FSTL4.
bad data            This will produce empty cells

the command

$ python Pos2LDBlock.py -c 1 -H /tmp/ld.tsv

will find LD blocks and report them as

GC                 Comment                                                                                             LD block LD covers LD begin           LD end
GRCh37:5:125707200 Block 5:842, SNP inside LD block, LD block is inside gene GRAMD3                                    5:842    1         GRCh37:5:125699985 GRCh37:5:125763912
GRCh37:5:125699800 Left of block 5:842, no LD block there, SNP is still inside GRAMD3                                  5:4694   0         GRCh37:5:125699599 GRCh37:5:125699745
GRCh37:5:125680000 Block 5:1909, left of GRAMD3, SNP inside LD block, SNP outside gene, block covers gene              5:1909   1         GRCh37:5:125678786 GRCh37:5:125699575
GRCh37:5:125678600 Block 5:4193, left of GRAMD3, SNP in LD block, LD block does not cover gene, GRAMD3 is nearest gene 5:4193   1         GRCh37:5:125678354 GRCh37:5:125678765
GRCh37:5:125676400 SNP is left of block 5:3561, block does not cover gene, nearest gene is GRAMD3                      5:895    0         GRCh37:5:125616528 GRCh37:5:125676392
GRCh37:5:132487144 This is located exactly in between genes HSPA4 and FSTL4.                                           5:2104   1         GRCh37:5:132477615 GRCh37:5:132494146
bad data           This will produce empty cells                                                                       --       --        --                 --

References

[1] Taliun D. et al (2014) Efficient haplotype block recognition of very long and dense genetic sequences. BMC Bioinformatics. 15, 10.

[2] Gabriel SB, et al. (2002) The structure of haplotype blocks in the human genome. Science. 15(5576), 2225–2229.