Data Integrator
|
PharmaADME is an industry initiated effort for identifying genetic biomarker reliably involved in drug metabolism. No publication is available for this project, see the PharmaADME web page for more information. We have downloaded, curated, and integrated the core list of genetic biomarkers.
Briefly, the curation process involved the following steps:
PhADME
and a number. Since there is no mechanism to check the reference allele's correctness, the tool tries to find a match on both the forward and the reverse strand. In doing so, the nucleotide alphabet is reversed to match the complementary strand and the string is reversed, too. So for example the following two variants are code for the same mutation:
Coordinate Ref Alt GRCh37:1:1234 A GCAGT GRCh37:1:1234 T ACTGC
Options applicable to more than a single tool are summarized in common command line options.
Three columns are required for running this tool: the genomic coordinate, the reference allele and the alternative allele. Information on how variants are encoded is found in gcoords2cons - Variation consequences.
By default, a column PharmaADME hit
is added with a PhADME
identifier if the correspoding variation is present in the PharmaADME list of core markers. In addition, dbSNP identifiers may be output as a separate column dbSNP ID
. If the variation is not found in the database, an empty cell is output. The same empty cell is output, if no dbSNP identifier is linked to the variation.
Given the input file tPharmaADMEntor-dbsnp-in.tsv
Empty Full Genomic Coords Ref Alt Info A GRCh37:7:99262835 C T rs10264272/PhADME43 B GRCh37:10:96827030 C T rs11572080/PhADME62 C GRCh37:19:41510282 A G rs12721655/PhADME68 D GRCh37:7:87160618 A C rs2032582/PhADME90 E GRCh37:10:96826975 T - --/PhADME11 F GRCh37:17:49731017 A G rs145659285/-- G GRCh37:17:49731104 G A --/--
the following command adds the column for hits in the PharmaADME database and if these hits have a dbSNP entry, it will also be shown as an additional column.
$ python PharmaADMEntor.py -H -c 2 3 4 --dbsnp tPharmaADMEntor-dbsnp-in.tsv
The output will then be:
Empty Full Genomic Coords Ref Alt Info PharmaADME hit dbSNP ID A GRCh37:7:99262835 C T rs10264272/PhADME43 PhADME43 rs10264272 B GRCh37:10:96827030 C T rs11572080/PhADME62 PhADME62 rs11572080 C GRCh37:19:41510282 A G rs12721655/PhADME68 PhADME68 rs12721655 D GRCh37:7:87160618 A C rs2032582/PhADME90 PhADME90 rs2032582 E GRCh37:10:96826975 T - --/PhADME11 PhADME11 -- F GRCh37:17:49731017 A G rs145659285/-- -- -- G GRCh37:17:49731104 G A --/-- -- --