Data Integrator
|
This tool annotates genomic variations and genomic regions with data from the ClinVar database [1]. There are two main different usage options:
The ClinVarAnnotator operates on tabular data. The input file has to be specified with the command line option –from-file
, it is also possible to read from standard input via –from file - <<<$'Something'
.
According to the scenarios mentioned above, the tool accepts three possible types of input, which are selected with the –in
flag and the column indices via the -c
option. The numbers given with -c
correspond to columns in the input file where the information that is required for the option given with –in
is located.
Required input:
–in option
: The type of analysis to perform. This must be one of: gcoords
: A genomic coordinate in the human genome. ClinVar annotations are added for matching genomic coordinates. gcoords-ref-alt
: A genomic coordinate in the genome and a reference and an alternative allele. ClinVar annotations are added for matching coordinates if the input reference and alternative allele match the ClinVar reference and alternative allele. range-abs
: A range spanning a region on one chromosome. All ClinVar variants and their annotations are added that lie in the specified range. –c x [y z]>
: The column indices in the input file, where data is located. Column indices start with one: –in gcoords
: One column index in the input file that gives the location of the genomic coordinates. –in gcoords-ref-alt
: The space seperated column indices that give the locations of the genomic coordinate, the reference allele, and the alternative allele in the input file, respectively. –in range-abs
: Two space seperated integers. The first column gives the start coordinate and the second the end coordinate. Both coordinates must be on the same chromosome and must not span a negative range. Further, the start and end coordinate themselves must not be a range. –from-file
: A tab-seperated text file which holds the information as indicated with the –in
and -c
options. Optional input: The ClinVar annotations that are to be appended to the output file. All available annotations can be added with the –all
flag. The available annotations are:
–CLNSIG
: Variant Clinical Significance –CLNORIGIN
: Allele Origin –CLNDBN
: Variant disease name –CLNSRC
: Variant clinical Chanels (variant source) –CLNACC
: Variant Accession and Versions –PM
: Variant is Precious(Clinical,Pubmed Cited) –ref-alt
: The reference and alternative allele as given in ClinVar –gcoords
: The genomic position of the variant in Dintor format –id
: The variant id as given in ClinVar (usually dbSNP) –all
: Abbreviation for adding all annotations. Giving this flag is equivalent to –CLNSIG –CLNORIGIN –CLNDBN –CLNSRC –CLNACC –PM –ref-alt –gcoords –id
Additional options applicable to more than a single tool (such as –help
, –data-version
or –version
) are summarized in common command line options.
Example input 1:
gcoord GRCh37:1:11082349
Example input 2:
gcoord ref alt GRCh37:1:2340118 G A GRCh37:1:2452273 C T
Example input 3:
range_start range_end GRCh37:1:11082348 GRCh37:1:11082461
As with other Dintor tools, the output is the input, plus added columns. These added columns correspond to the annotations that the user required. If there are multiple entries in ClinVar that match one input line, the input line is output multiple times with the information from the different ClinVar entries. One ClinVar entry may contain multiple pipe seperated values; these are values from different submissions.
Example output 1: Given the above listed input file from example 1 saved as /tmp/gcoords.tsv
, the following command line appends ClinVar annotations, if the genomic coordinates matches.
$python ClinVarAnnotator.py -H --in gcoords -c 1 --ref-alt --CLNSIG --from-file /tmp/gcoords.tsv
gcoord ClinVar.Ref ClinVar.Alt ClinVar.CLNSIG GRCh37:1:11082349 G A pathogenic GRCh37:1:11082349 G C pathogenic
Example output 2: Given the above listed input file from example 2 saved as /tmp/gcoords-ref-alt.tsv
, the following comand line appends ClinVar annotations, if the genomic coordinate and the ref and alt allele match.
$ python ClinVarAnnotator.py -H --in gcoords-ref-alt -c 1 2 3 --ref-alt --CLNSIG --CLNORIGIN --from-file /tmp/gcoords-ref-alt.tsv
gcoord ref alt ClinVar.Ref ClinVar.Alt ClinVar.CLNSIG ClinVar.CLNORIGIN GRCh37:1:2340118 G A G A pathogenic unknown GRCh37:1:2452273 C T C T untested somatic
Example output 3: Given the above listed input file from example 2 saved as /tmp/range.tsv
, the following comand line appends ClinVar annotations of variants that lie in the given range.
$ python ClinVarAnnotator.py --in range-abs -c 1 2 --gcoords --ref-alt --CLNSIG --from-file /tmp/gcoords.tsv
range_start range_end ClinVar.Gcoords ClinVar.Ref ClinVar.Alt ClinVar.CLNSIG GRCh37:1:11082348 GRCh37:1:11082461 GRCh37:1:11082349 G A pathogenic GRCh37:1:11082348 GRCh37:1:11082461 GRCh37:1:11082349 G C pathogenic GRCh37:1:11082348 GRCh37:1:11082461 GRCh37:1:11082358 G A pathogenic GRCh37:1:11082348 GRCh37:1:11082461 GRCh37:1:11082397 A G pathogenic GRCh37:1:11082348 GRCh37:1:11082461 GRCh37:1:11082409 G A pathogenic GRCh37:1:11082348 GRCh37:1:11082461 GRCh37:1:11082425 C T untested GRCh37:1:11082348 GRCh37:1:11082461 GRCh37:1:11082457 C A pathogenic GRCh37:1:11082348 GRCh37:1:11082461 GRCh37:1:11082461 G A pathogenic
[1] Landrum, Melissa J., et al. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic acids research (2013) [PMID 24234437]