This workflow involves a chain of processes to construct a knowledge graph from a list of scientific article DOIs. It aims to establish connections between scientific articles contained in PubMed and pairs of taxa/metabolites through the "produces" relationship. This work is based on the repository Relation Extraction in underexplored biomedical domains: A diversity-optimised sampling and synthetic data generation approach.
- Search in PubMed for articles related to a taxon of the Brassicaceae family and glucosinolate compounds.
curl -s 'https://pubmed.ncbi.nlm.nih.gov/?term=brassica+glucosinolate&format=pubmed&size=200' | grep "\[doi\]" | cut -d" " -f3 > data/brassicale_glucosinolate.txt
python src/api_doi.py --list_doi "10.1021/jf401802n,10.1021/jf405538d" --output test.json
python src/api_doi.py --list_doi_file data/list_doi_example.txt --output test.json
TODO
- Working with a GPU environment
ssh $USER@genossh
srun --gpus 1 -p gpu --pty bash
. /local/env/envpython-3.9.5.sh
virtualenv ~/env-idiap ## only the first time !!
source ~/env-idiap/bin/activate
export PATH=/home/genouest/inra_umr1349/$USER/.local/bin:$PATH
python src/workflow_idap.py --dump igepp.json
- Relation Extraction in underexplored biomedical domains: A diversity-optimised sampling and synthetic data generation approach
- colab
pip install pygbif rdflib
python src/build_rdf_graph.py --dump_doi test.json --dump_taxon_compound test_taxon_metabolite_associations_idiap.json