Data Integrator
DrugBankAnnotator - Furnish proteins with associated drugs

This tool integrates the DrugBank resource such that proteins can be checked if they are targeted by drugs, or if they are involved in drug metabolism processed.

DrugBank has its own way of naming their objects, so here is a glossary that has been extracted directly from the source code file DrugBankAnnotator.py, and the XML tags refer to the data base file where information is parsed from.

  • GROUP: Drug class.

    FDA oriented classification of drug, like 'approved', 'withdrawn', 'illicit', 'experimental'. Given by the

     <groups>
      <group>...</group>
     </groups>
    

    list.

  • ACTION: Mechanism of drug interaction with partner.

    Describes in which way the drug is related to the partner protein. This can be an inhibitor, but also substrate of a drug digesting enzyme. Parsed from the

    <actions>
     <action>...</action>
    </actions>
    

    list of a drug record.

  • KNOWN ACTION: Knowledge on how the drug works.

    This is only applicable to drug targets, which are a certain class of drug partners. If the pharmacological action (that is, how does the drug work) is known, there is a 'yes' in the record. Parsed from the <known-action> record of a drug's target section.

  • AFFECTED ORGANISM: Organism in which the drug is targeting the drug partner.

    Since DrugBank is centered on drugs for humans, the affected organism refers to the drug partner protein. This is in many cases a human protein, but can also be a protein from a virus or bacteria if this is targeted by the drug in some way. Retrieved from the

    <affected-organisms>
     <affected-organism>...<affected-organism>
    </affected-organisms>
    

    list.

  • PARTNER: Protein which is in physical contact with a drug.

    Any protein in the host or pathogen organism that is known to be in physical contact with the drug. We currently have drug transporters, drug carriers, drug metabolizing enzymes and drug targets.

  • BOND TYPE: Drug-related function of drug partner.

    Describes the function which is relevant in the drug context. See PARTNER. Retrieved implicitly from the

    <targets> <target>...<target> </targets>
    <transporters> <transporter>...<transporter> </transporters>
    <enzymes> <enzyme>...<enzyme> </enzymes>
    <carriers> <carrier>...<carrier> </carriers>
    

    lists.

Input

The input consists of a single column with UniProt accession numbers. If only gene data is available, we refer to the gene/protein mapper, HSEnsgProteinMapper - Ensembl gene ID to protein ID mapping, which readily maps genes to their proteins. There are many convenience functions to get information about keywords that can be used in the filters for restricting output. See the –list-* command line options.

Options applicable to more than a single tool are summarized in common command line options.

Output

By default, the tool outputs the drug name that is associated with the protein, a single name per line. If no drug has been found for the protein, the empty cell is output instead. In addition, output can be supplemented with the affected organism, the bond type, the drug's ChEBI identifier, the drug's drugbank identifer, the drug class and a flag for knowledge on how the drug works.

Since a protein can be associated with many types of drugs, filters can additionally applied to the restrict the output to more specific information. Generally, each filter can be selected multiple times and filters can be combined. If a filter takes multiple arguments, they are combined with a logical "or", and multiple filters are combined by logical "and". This can be thought of I want to have filterX report only keyword A, B, or C, and filterY report only keyword D.

Specifically, each item listed in the above glossary can be filtered, that is, the affected organism, bond types, drug classes, actions, known actions, and types of drug/protein (physical) interaction.

If filtering is on and the filter criteria did not yield any drugs, the empty cell will be output for each column.

Given the input file /tmp/simple.tsv,

UniProtAccNr
P23141
P02763
P00533
P04070
P08908
P19623
Q9GZV3
P19793

the command

$ python DrugBankAnnotator.py -c 1 -H --bondtype --filter-bondtypes enzyme --from-file /tmp/simple.tsv

will output the following lines:

UniProtAccNr Drug Name                                                              Drug Partner Type
P23141       Indomethacin                                                           enzyme
P23141       Trandolapril                                                           enzyme
P23141       Tamoxifen                                                              enzyme
P23141       Mycophenolate mofetil                                                  enzyme
P23141       Clopidogrel                                                            enzyme
P23141       Irinotecan                                                             enzyme
P23141       Benzocaine                                                             enzyme
P23141       Capecitabine                                                           enzyme
P23141       Ciclesonide                                                            enzyme
P23141       Hydroxy-Phenyl-Acetic Acid 8-Methyl-8-Aza-Bicyclo[3.2.1]Oct-3-Yl Ester enzyme
P23141       O-Sialic Acid                                                          enzyme
P23141       N-Methylnaloxonium                                                     enzyme
P23141       Rufinamide                                                             enzyme
P23141       Mevastatin                                                             enzyme
P23141       Dabigatran etexilate                                                   enzyme
P02763       --                                                                     --
P00533       --                                                                     --
P04070       Antihemophilic Factor                                                  enzyme
P08908       --                                                                     --
P19623       S-Adenosylmethionine                                                   enzyme
Q9GZV3       --                                                                     --
P19793       --                                                                     --