Data Integrator
|
Find the corresponding LD block for a given position on a chromosome. The LD blocks are computed based on a highly optmized version [1] of the original publication [2].
The input to this tool is very simple, it contains a column with genomic coordinates. Options applicable to more than a single tool are summarized in common command line options.
For the genomic coordinate in the input column the closest LD is looked up. Four columns specify the LD block and its relationship to the genomic coordinate :
LD block
The identifier of the LD block. It consists of the chromosome number and an integer, and the same identifier is used in its Ensembl browser track. LD covers
Is set to 1
if the location is contained in the LD block and it is set to 0
if not. LD begin
Specifies the LD block's begin genomic coordinate. LD end
Specifies the LD block's end genomic coordinate. An empty cell is output, if the input is not a valid genomic coordinate.
For example, if the input file is stored in /tmp/ld.tsv
as
GC Comment GRCh37:5:125707200 Block 5:842, SNP inside LD block, LD block is inside gene GRAMD3 GRCh37:5:125699800 Left of block 5:842, no LD block there, SNP is still inside GRAMD3 GRCh37:5:125680000 Block 5:1909, left of GRAMD3, SNP inside LD block, SNP outside gene, block covers gene GRCh37:5:125678600 Block 5:4193, left of GRAMD3, SNP in LD block, LD block does not cover gene, GRAMD3 is nearest gene GRCh37:5:125676400 SNP is left of block 5:3561, block does not cover gene, nearest gene is GRAMD3 GRCh37:5:132487144 This is located exactly in between genes HSPA4 and FSTL4. bad data This will produce empty cells
the command
$ python Pos2LDBlock.py -c 1 -H /tmp/ld.tsv
will find LD blocks and report them as
GC Comment LD block LD covers LD begin LD end GRCh37:5:125707200 Block 5:842, SNP inside LD block, LD block is inside gene GRAMD3 5:842 1 GRCh37:5:125699985 GRCh37:5:125763912 GRCh37:5:125699800 Left of block 5:842, no LD block there, SNP is still inside GRAMD3 5:4694 0 GRCh37:5:125699599 GRCh37:5:125699745 GRCh37:5:125680000 Block 5:1909, left of GRAMD3, SNP inside LD block, SNP outside gene, block covers gene 5:1909 1 GRCh37:5:125678786 GRCh37:5:125699575 GRCh37:5:125678600 Block 5:4193, left of GRAMD3, SNP in LD block, LD block does not cover gene, GRAMD3 is nearest gene 5:4193 1 GRCh37:5:125678354 GRCh37:5:125678765 GRCh37:5:125676400 SNP is left of block 5:3561, block does not cover gene, nearest gene is GRAMD3 5:895 0 GRCh37:5:125616528 GRCh37:5:125676392 GRCh37:5:132487144 This is located exactly in between genes HSPA4 and FSTL4. 5:2104 1 GRCh37:5:132477615 GRCh37:5:132494146 bad data This will produce empty cells -- -- -- --
[1] Taliun D. et al (2014) Efficient haplotype block recognition of very long and dense genetic sequences. BMC Bioinformatics. 15, 10.
[2] Gabriel SB, et al. (2002) The structure of haplotype blocks in the human genome. Science. 15(5576), 2225–2229.