Data Integrator
|
This tool integrates the DrugBank resource such that proteins can be checked if they are targeted by drugs, or if they are involved in drug metabolism processed.
DrugBank has its own way of naming their objects, so here is a glossary that has been extracted directly from the source code file DrugBankAnnotator.py
, and the XML tags refer to the data base file where information is parsed from.
GROUP: Drug class.
FDA oriented classification of drug, like 'approved', 'withdrawn', 'illicit', 'experimental'. Given by the
<groups> <group>...</group> </groups>
list.
ACTION: Mechanism of drug interaction with partner.
Describes in which way the drug is related to the partner protein. This can be an inhibitor, but also substrate of a drug digesting enzyme. Parsed from the
<actions> <action>...</action> </actions>
list of a drug record.
KNOWN ACTION: Knowledge on how the drug works.
This is only applicable to drug targets, which are a certain class of drug partners. If the pharmacological action (that is, how does the drug work) is known, there is a 'yes' in the record. Parsed from the <known-action> record of a drug's target section.
AFFECTED ORGANISM: Organism in which the drug is targeting the drug partner.
Since DrugBank is centered on drugs for humans, the affected organism refers to the drug partner protein. This is in many cases a human protein, but can also be a protein from a virus or bacteria if this is targeted by the drug in some way. Retrieved from the
<affected-organisms> <affected-organism>...<affected-organism> </affected-organisms>
list.
PARTNER: Protein which is in physical contact with a drug.
Any protein in the host or pathogen organism that is known to be in physical contact with the drug. We currently have drug transporters, drug carriers, drug metabolizing enzymes and drug targets.
BOND TYPE: Drug-related function of drug partner.
Describes the function which is relevant in the drug context. See PARTNER. Retrieved implicitly from the
<targets> <target>...<target> </targets> <transporters> <transporter>...<transporter> </transporters> <enzymes> <enzyme>...<enzyme> </enzymes> <carriers> <carrier>...<carrier> </carriers>
lists.
The input consists of a single column with UniProt accession numbers. If only gene data is available, we refer to the gene/protein mapper, HSEnsgProteinMapper - Ensembl gene ID to protein ID mapping, which readily maps genes to their proteins. There are many convenience functions to get information about keywords that can be used in the filters for restricting output. See the –list-*
command line options.
Options applicable to more than a single tool are summarized in common command line options.
By default, the tool outputs the drug name that is associated with the protein, a single name per line. If no drug has been found for the protein, the empty cell is output instead. In addition, output can be supplemented with the affected organism, the bond type, the drug's ChEBI identifier, the drug's drugbank identifer, the drug class and a flag for knowledge on how the drug works.
Since a protein can be associated with many types of drugs, filters can additionally applied to the restrict the output to more specific information. Generally, each filter can be selected multiple times and filters can be combined. If a filter takes multiple arguments, they are combined with a logical "or", and multiple filters are combined by logical "and". This can be thought of I want to have filterX report only keyword A, B, or C, and filterY report only keyword D.
Specifically, each item listed in the above glossary can be filtered, that is, the affected organism, bond types, drug classes, actions, known actions, and types of drug/protein (physical) interaction.
If filtering is on and the filter criteria did not yield any drugs, the empty cell will be output for each column.
Given the input file /tmp/simple.tsv
,
UniProtAccNr P23141 P02763 P00533 P04070 P08908 P19623 Q9GZV3 P19793
the command
$ python DrugBankAnnotator.py -c 1 -H --bondtype --filter-bondtypes enzyme --from-file /tmp/simple.tsv
will output the following lines:
UniProtAccNr Drug Name Drug Partner Type P23141 Indomethacin enzyme P23141 Trandolapril enzyme P23141 Tamoxifen enzyme P23141 Mycophenolate mofetil enzyme P23141 Clopidogrel enzyme P23141 Irinotecan enzyme P23141 Benzocaine enzyme P23141 Capecitabine enzyme P23141 Ciclesonide enzyme P23141 Hydroxy-Phenyl-Acetic Acid 8-Methyl-8-Aza-Bicyclo[3.2.1]Oct-3-Yl Ester enzyme P23141 O-Sialic Acid enzyme P23141 N-Methylnaloxonium enzyme P23141 Rufinamide enzyme P23141 Mevastatin enzyme P23141 Dabigatran etexilate enzyme P02763 -- -- P00533 -- -- P04070 Antihemophilic Factor enzyme P08908 -- -- P19623 S-Adenosylmethionine enzyme Q9GZV3 -- -- P19793 -- --