Data Integrator
PharmaADMEntor - Flag variants reliably associated with drug metabolism

PharmaADME is an industry initiated effort for identifying genetic biomarker reliably involved in drug metabolism. No publication is available for this project, see the PharmaADME web page for more information. We have downloaded, curated, and integrated the core list of genetic biomarkers.

Briefly, the curation process involved the following steps:

  • Uplifting of NCBI36 coordinates to GRCh37.
  • Use of dbSNP identifiers to find the location on the genome. In some cases the mutation was ambiguous, we then tried to find the adequate mutation from the descriptions given by the core list.
  • If coordinates were not available, the upstream and downstream sequences were used to find the position in the genome.
  • Mapping of the reference base pair(s) to the forward strand of the chromosome. (In most cases these base pairs referred to the gene.)
  • If in the meantime the variant had an entry in dbSNP, it was added.
  • A unique identifier was assigned to each variant, consisting of a base string PhADME and a number.

Since there is no mechanism to check the reference allele's correctness, the tool tries to find a match on both the forward and the reverse strand. In doing so, the nucleotide alphabet is reversed to match the complementary strand and the string is reversed, too. So for example the following two variants are code for the same mutation:

Coordinate     Ref Alt
GRCh37:1:1234  A   GCAGT
GRCh37:1:1234  T   ACTGC

Options applicable to more than a single tool are summarized in common command line options.

Input

Three columns are required for running this tool: the genomic coordinate, the reference allele and the alternative allele. Information on how variants are encoded is found in gcoords2cons - Variation consequences.

Output

By default, a column PharmaADME hit is added with a PhADME identifier if the correspoding variation is present in the PharmaADME list of core markers. In addition, dbSNP identifiers may be output as a separate column dbSNP ID. If the variation is not found in the database, an empty cell is output. The same empty cell is output, if no dbSNP identifier is linked to the variation.

Given the input file tPharmaADMEntor-dbsnp-in.tsv

Empty Full Genomic Coords Ref Alt Info
A     GRCh37:7:99262835   C   T   rs10264272/PhADME43
B     GRCh37:10:96827030  C   T   rs11572080/PhADME62
C     GRCh37:19:41510282  A   G   rs12721655/PhADME68
D     GRCh37:7:87160618   A   C   rs2032582/PhADME90
E     GRCh37:10:96826975  T   -   --/PhADME11
F     GRCh37:17:49731017  A   G   rs145659285/--
G     GRCh37:17:49731104  G   A   --/--

the following command adds the column for hits in the PharmaADME database and if these hits have a dbSNP entry, it will also be shown as an additional column.

$ python PharmaADMEntor.py -H -c 2 3 4 --dbsnp tPharmaADMEntor-dbsnp-in.tsv

The output will then be:

Empty Full Genomic Coords Ref Alt Info                PharmaADME hit dbSNP ID
A     GRCh37:7:99262835   C   T   rs10264272/PhADME43 PhADME43       rs10264272
B     GRCh37:10:96827030  C   T   rs11572080/PhADME62 PhADME62       rs11572080
C     GRCh37:19:41510282  A   G   rs12721655/PhADME68 PhADME68       rs12721655
D     GRCh37:7:87160618   A   C   rs2032582/PhADME90  PhADME90       rs2032582
E     GRCh37:10:96826975  T   -   --/PhADME11         PhADME11       --
F     GRCh37:17:49731017  A   G   rs145659285/--      --             --
G     GRCh37:17:49731104  G   A   --/--               --             --