Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data source: DrugMAP #112

Open
andrewsu opened this issue Apr 20, 2023 · 6 comments
Open

Data source: DrugMAP #112

andrewsu opened this issue Apr 20, 2023 · 6 comments
Assignees
Labels
data source Data source pending to create a new API

Comments

@andrewsu
Copy link
Member

publication: https://academic.oup.com/nar/article/51/D1/D1288/6761740
example drug record: http://drugmap.idrblab.net/data/drug/details/DM6A0X7
download page: http://drugmap.idrblab.net/full-data-download

image

minimally, I can see creating an API for drug targets, transporters, and metabolizing enzymes. Possibly more too...

@andrewsu andrewsu added the data source Data source pending to create a new API label Apr 20, 2023
@erikyao erikyao self-assigned this Apr 25, 2023
@erikyao
Copy link
Contributor

erikyao commented Apr 25, 2023

Hi @andrewsu @newgene, do you recognize the "raw" data format of DrugMap? E.g.

DrugMAP Full Data Download File
Title - DrugMAP drug information in raw format
Version 1.01 (2022.07.20)
Provided by Lab of Innovative Drug Reasearch and Bioinformatics (IDRB)
            College of Pharmaceutical Sciences
            Zhejiang University
            https://idrblab.org/
Any question about data provided here, please contact with:
Dr. Li ([email protected]) and Dr. Yin ([email protected])

ID: Drug ID
DN: Drug Name
HS: Highest Status
SN: Synonymous
CP: Company
TC: Therapeutic Class
DT: Drug Type
SQ: Sequence
PC: PubChem CID
MW: Molecular Weight
FM: Formula
IC: InChI
CS: Canonical SMILES
IK: InChIKey
IU: IUPAC Name
CA: CAS Number
CB: ChEBI ID
DE: Disease Entry

DMMHNU2	ID	DMMHNU2
DMMHNU2	DN	(S)-(+)-Dimethindene maleate
DMMHNU2	HS	Approved
DMMHNU2	SN	UNII-J43ZL3WTLN; J43ZL3WTLN; 136152-65-3; DSSTox_RID_83231; DSSTox_CID_28966; DSSTox_GSID_49040; Dimethindene maleate, (+)-; CAS-1217457-81-2; NCGC00025158-01; 121367-05-3; Dimethindene maleate, (S)-(+)-; DTXSID8049040; CHEMBL2356801; MolPort-023-276-095; HMS3267J12; Tox21_113582; AKOS024456589; Tox21_113582_1; NCGC00025158-02; 1H-Indene-2-ethanamine, N,N-dimethyl-3-((1S)-1-(2-pyridinyl)ethyl)-, (2Z)-2-butenedioate (1:1); B6734; (S)-(+)-Dimethindene maleate, > UNII-6LL60J9E0O component
DMMHNU2	TC	Antiinflammatory Agents
DMMHNU2	DT	Small molecular drug
DMMHNU2	PC	56972160
DMMHNU2	MW	408.5
DMMHNU2	FM	C24H28N2O4
DMMHNU2	IC	InChI=1S/C20H24N2.C4H4O4/c1-15(19-10-6-7-12-21-19)20-17(11-13-22(2)3)14-16-8-4-5-9-18(16)20;5-3(6)1-2-4(7)8/h4-10,12,15H,11,13-14H2,1-3H3;1-2H,(H,5,6)(H,7,8)/b;2-1-/t15-;/m1./s1
DMMHNU2	CS	C[C@H](C1=CC=CC=N1)C2=C(CC3=CC=CC=C32)CCN(C)C.C(=C\\C(=O)O)\\C(=O)O
DMMHNU2	IK	SWECWXGUJQLXJF-HFNHQGOYSA-N
DMMHNU2	IU	(Z)-but-2-enedioic acid;N,N-dimethyl-2-[3-[(1S)-1-pyridin-2-ylethyl]-1H-inden-2-yl]ethanamine
DMMHNU2	CA	CAS 136152-65-3
DMMHNU2	DE	Pruritus

DMIAHVU	ID	DMIAHVU
DMIAHVU	DN	2-deoxyglucose
DMIAHVU	HS	Approved
DMIAHVU	SN	154-17-6; Deoxyglucose; 2-Deoxy-D-mannose; 2-Deoxy-D-arabinohexose; UNII-9G2MP84A8W; D-Arabino-hexose, 2-deoxy-; HSDB 5484; arabino-Hexose, 2-deoxy-; D-Glucose, 2-deoxy-; 9G2MP84A8W; (3R,4S,5R)-3,4,5,6-tetrahydroxyhexanal; AK-44445; 2 Deoxyglucose; 2 Deoxy D glucose; 2 Desoxy D glucose; D-arabino-2-desoxyhexose; d-2-glucodesose; D-2dGlc; deoxy-d-glucose, 2-; 2-DEOXYLGLUCOSE; 2-INNo-D-AEIIC; SCHEMBL7670; AC1L33KH; KSC175S5P; 4-01-00-04282 (Beilstein Handbook Reference); Jsp003004; CHEMBL2074932; CTK0H5957; MolPort-002-317-302; 2-deoxy-D-glucose
DMIAHVU	CP	Threshold Pharmaceuticals
DMIAHVU	DT	Small molecular drug
DMIAHVU	PC	108223
DMIAHVU	MW	164.16
DMIAHVU	FM	C6H12O5
DMIAHVU	IC	InChI=1S/C6H12O5/c7-2-1-4(9)6(11)5(10)3-8/h2,4-6,8-11H,1,3H2/t4-,5-,6+/m1/s1
DMIAHVU	CS	C(C=O)[C@H]([C@@H]([C@@H](CO)O)O)O
DMIAHVU	IK	VRYALKFFQXWPIH-PBXRRBTRSA-N
DMIAHVU	IU	(3R,4S,5R)-3,4,5,6-tetrahydroxyhexanal
DMIAHVU	CA	CAS 154-17-6
DMIAHVU	DE	Solid tumour/cancer

The first block is a comment, followed by abbreviation legend; then each block is an entry of drug.

It's not hard to parse it with Python, but if it's a known format, we may be able to save some time using some libraries. Thanks!

@andrewsu
Copy link
Member Author

@erikyao unfortunately, it's not any standard format that I recognize...

@erikyao
Copy link
Contributor

erikyao commented Apr 25, 2023

@erikyao unfortunately, it's not any standard format that I recognize...

@andrewsu I just dropped an email to DrugMAP team. Will keep you updated.

@andrewsu
Copy link
Member Author

andrewsu commented Sep 5, 2024

The top of the "Drug to DME Mapping Information" file looks like this:

EI: DME_ID
EN: DME_Name
DI: Drug_ID
DN: Drug_Name
RN: Reference_Name
RU: Reference_URL

DE4LYSA	EI	DE4LYSA
DE4LYSA	EN	Cytochrome P450 3A4 (CYP3A4)
DE4LYSA	DI	DMIKQH5
DE4LYSA	DN	Hydroxyprogesterone caproate
DE4LYSA	RN	Prevention of preterm delivery with 17-hydroxyprogesterone caproate: pharmacologic considerations. Semin Perinatol. 2014 Dec;38(8):516-22.
DE4LYSA	RU	https://www.ncbi.nlm.nih.gov/pubmed/?term=25256193

From this record, we would want to yield a JSON object like this:

{
  "_id": "DE4LYSA-METABOLIZES-DMIKQH5",
  "_version": 1,
  "object": {
    "name": "Hydroxyprogesterone caproate",
    "id": "DMIKQH5"
  },
  "predicate": "metabolizes",
  "predication": [
    {
      "pmid": 25256193
    }
  ],
   "subject": {
    "name": "Cytochrome P450 3A4 (CYP3A4)",
    "id": "DE4LYSA"
  }
}

@seltepu
Copy link

seltepu commented Sep 14, 2024

The DrugMAP parser attached below creates an array of JSON objects representing drug-enzyme mappings, along with their associated reference details. It works by reading a text file that contains raw data on drug-to-enzyme interactions and then converting this information into a standardized JSON format and printing it to an output JSON file. The parser extracts key details such as drug names, drug IDs, enzyme names, enzyme IDs, and relevant reference citations (such as PubMed IDs). By organizing these fields into structured JSON objects, the parser ensures that the data can be easily queried, analyzed, and integrated into larger systems, such as APIs.

GitHub of DrugMAP Parser: https://github.com/seltepu/Updated-DrugMAP-Parser/tree/main

The above GitHub contains the input file of JSON objects, the processing Python code, and also the JSON array output file.

@andrewsu
Copy link
Member Author

andrewsu commented Sep 17, 2024

@seltepu Looking good! However, I realized that we need to make some changes... Here is the first record from your output file:

  {
    "_id": "DE4LYSA-METABOLIZES-DMIKQH5",
    "_version": 1,
    "object": {
      "name": "Hydroxyprogesterone caproate",
      "id": "DMIKQH5"
    },
    "predicate": "metabolizes",
    "predication": [
      {
        "pmid": "25256193"
      }
    ],
    "subject": {
      "name": "Cytochrome P450 3A4 (CYP3A4)",
      "id": "DE4LYSA"
    }
  },

I thought that the identifiers DMIKQH5 and DE4LYSA are commonly used identifiers, but it turns out not to be the case. So we need to do a translation based on annotation files that the authors provide on their website. The subjects should always be able to be looked up in the "General Information of Drug Metabolism Enzyme (DME)" file. For example, here is the record for DE4LYSA:

DE4LYSA	ID	DE4LYSA
DE4LYSA	DN	Cytochrome P450 3A4 (CYP3A4)
DE4LYSA	GN	CYP3A4
DE4LYSA	SN	Cytochrome P450 family 3 subfamily A member 4; Quinine 3-monooxygenase; 1,4-cineole 2-exo-monooxygenase; 1,8-cineole 2-exo-monooxygenase; Albendazole monooxygenase (sulfoxide-forming); Albendazole sulfoxidase; CYP3A3; CYP3A4; CYPIIIA3; CYPIIIA4; Cholesterol 25-hydroxylase; Cytochrome P450 3A3; Cytochrome P450 HLp; Cytochrome P450 NF-25; Cytochrome P450-PCN1; Nifedipine oxidase
DE4LYSA	UC	CP3A4_HUMAN
DE4LYSA	RD	Bosutinib
DE4LYSA	GI	1576
DE4LYSA	E1	1: Oxidoreductase
DE4LYSA	E2	1.14: Oxygen paired donor oxidoreductase
DE4LYSA	E3	1.14.14: Flavin/flavoprotein donor oxidoreductase
DE4LYSA	EC	1.14.14.55
DE4LYSA	RC	Aflatoxin activation and detoxification:R-HSA-5423646; Biosynthesis of maresin-like SPMs:R-HSA-9027307; Xenobiotics:R-HSA-211981
DE4LYSA	KG	Bile secretion:hsa04976; Chemical carcinogenesis:hsa05204; Drug metabolism - cytochrome P450:hsa00982; Drug metabolism - other enzymes:hsa00983; Linoleic acid metabolism:hsa00591; Metabolic pathways:hsa01100; Metabolism of xenobiotics by cytochrome P450:hsa00980; Retinol metabolism:hsa00830; Steroid hormone biosynthesis:hsa00140
DE4LYSA	PD	1W0F; 1W0G; 2J0D; 2V0M; 3NXU; 3TJS; 3UA1; 4D6Z; 4D75
DE4LYSA	SQ	MALIPDLAMETWLLLAVSLVLLYLYGTHSHGLFKKLGIPGPTPLPFLGNILSYHKGFCMFDMECHKKYGKVWGFYDGQQPVLAITDPDMIKTVLVKECYSVFTNRRPFGPVGFMKSAISIAEDEEWKRLRSLLSPTFTSGKLKEMVPIIAQYGDVLVRNLRREAETGKPVTLKDVFGAYSMDVITSTSFGVNIDSLNNPQDPFVENTKKLLRFDFLDPFFLSITVFPFLIPILEVLNICVFPREVTNFLRKSVKRMKESRLEDTQKHRVDFLQLMIDSQNSKETESHKALSDLELVAQSIIFIFAGYETTSSVLSFIMYELATHPDVQQKLQEEIDAVLPNKAPPTYDTVLQMEYLDMVVNETLRLFPIAMRLERVCKKDVEINGMFIPKGVVVMIPSYALHRDPKYWTEPEKFLPERFSKKNKDNIDPYIYTPFGSGPRNCIGMRFALMNMKLALIRVLQNFSFKPCKETQIPLKLSLGGLLQPEKPVVLKVESRDGTVSGA
DE4LYSA	TD	Primarily distributed in intestine and liver.
DE4LYSA	FC	This enzyme is involved in the metabolism of sterols, steroid hormones, retinoids and fatty acids. It exhibits high catalytic activity for the formation of hydroxyestrogens from estrone (E1) and 17beta- estradiol (E2), namely 2-hydroxy E1 and E2, as well as D-ring hydroxylated E1 and E2 at the C-16 position and plays a role in the metabolism of androgens, particularly in oxidative deactivation of testosterone. It also metabolizes testosterone to less biologically active 2beta- and 6beta- hydroxytestosterones. It catalyzes bisallylic hydroxylation of polyunsaturated fatty acids (PUFA) and the epoxidation of double bonds of PUFA with a preference for the last double bond. It metabolizes endocannabinoid arachidonoylethanolamide (anandamide) to 8,9-, 11,12-, and 14,15- epoxyeicosatrienoic acid ethanolamides (EpETrE-EAs) and plays a role in the metabolism of retinoids.In addition, it displays high catalytic activity for oxidation of all-trans-retinol to all-trans-retinal, a rate- limiting step for the biosynthesis of all-trans-retinoic acid (atRA) and further metabolizes atRA toward 4-hydroxyretinoate and may play a role in hepatic atRA clearance. It is also responsible for oxidative metabolism of xenobiotics. It metabolizes the majority of the administered drugs; catalyzes sulfoxidation of the anthelmintics albendazole and fenbendazole and hydroxylates antimalarial drug quinine.
DE4LYSA	KD	33208: Metazoa
DE4LYSA	PL	7711: Chordata
DE4LYSA	CL	40674: Mammalia
DE4LYSA	OD	9443: Primates
DE4LYSA	FM	9604: Hominidae
DE4LYSA	GE	9605: Homo
DE4LYSA	SP	9606: Homo sapiens

We are interested in the GI (Gene ID) field -- in this case 1576.

Similarly, the info on DMIKQH5 can be found in the "General Information of Drug" file:

DMIKQH5	ID	DMIKQH5
DMIKQH5	DN	Hydroxyprogesterone
DMIKQH5	HS	Approved
DMIKQH5	SN	Delalutin (TN); Proluton depot (TN)
DMIKQH5	CP	Bristol Myers Squibb
DMIKQH5	DT	Small molecular drug
DMIKQH5	PC	6238
DMIKQH5	MW	330.5
DMIKQH5	FM	C21H30O3
DMIKQH5	IC	InChI=1S/C21H30O3/c1-13(22)21(24)11-8-18-16-5-4-14-12-15(23)6-9-19(14,2)17(16)7-10-20(18,21)3/h12,16-18,24H,4-11H2,1-3H3/t16-,17+,18+,19+,20+,21+/m1/s1
DMIKQH5	CS	CC(=O)[C@]1(CC[C@@H]2[C@@]1(CC[C@H]3[C@H]2CCC4=CC(=O)CC[C@]34C)C)O
DMIKQH5	IK	DBPWSSGDRRHUNT-CEGNMAFCSA-N
DMIKQH5	IU	(8R,9S,10R,13S,14S,17R)-17-acetyl-17-hydroxy-10,13-dimethyl-2,6,7,8,9,11,12,14,15,16-decahydro-1H-cyclopenta[a]phenanthren-3-one
DMIKQH5	CA	CAS 68-96-2
DMIKQH5	CB	CHEBI:17252
DMIKQH5	DE	Solid tumour/cancer

Let's capture the PC (PubChem CID), IK (InChIKey), and CB fields (ChEBI ID). So with those additional mappings, your output object should now look something like this:

  {
    "_id": "DE4LYSA-METABOLIZES-DMIKQH5",
    "_version": 1,
    "object": {
      "name": "Hydroxyprogesterone caproate",
      "id": "DMIKQH5",
      "pubchem_cid": "6238",
      "inchikey": "DBPWSSGDRRHUNT-CEGNMAFCSA-N",
      "chebi_id": "CHEBI:17252"
    },
    "predicate": "metabolizes",
    "predication": [
      {
        "pmid": 25256193
      }
    ],
    "subject": {
      "name": "Cytochrome P450 3A4 (CYP3A4)",
      "id": "DE4LYSA",
      "gene_id": "1576"
    }
  },

(NOTE also the change for pmid from a string to an int -- no quotes...) Let me know if you have any other questions!

@ctrl-schaff ctrl-schaff self-assigned this Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data source Data source pending to create a new API
Projects
None yet
Development

No branches or pull requests

4 participants