BioCreative VI — Track 5: text mining chemical-protein interactions (ChemProt)

This code presents our system for the ChemProt task.

Requirements

Ubuntu, Python 3.6.4. Install the required packages:

$ pip install -r requirements.txt

Usage

Scripts

confusion.py: Calculate the confusion matrix and other statistics given a file with predicted relations.

create_embeddings.py: Create pre-trained part-of-speech and dependency embedding vectors.

main.py: Train a deep learning model and test it. The deep learning model can be a bidirectional long short-term memory (BiLSTM) recurrent network or a convolutional neural network (CNN). It is necessary to edit the script to choose the different input arguments. Only the seed number can be passed by command line:

$ python main.py 2

mfuncs.py: Functions used by the main.py script.

support.py: Auxiliary code to treat the ChemProt dataset.

utils.py: General use utilities.

voting.py: Average several outputs (probabilities). Edit the script to choose the input directory and the group to be evaluated.

Datasets

The datasets were pre-processed (tokenization, sentence splitting, part-of-speech tagging, and dependency parsing) by the Turku Event Extraction System (TEES). Available for download as data.zip [Mirror 1] [Mirror 2]:

ChemProt: sample, training, development, and test_gs.
BioGRID: biogrid.

Word embeddings

Our word embedding models were created from PubMed English abstracts. We also pre-trained part-of-speech and dependency embedding vectors from the ChemProt dataset. Available for download as word2vec.zip [Mirror 1] [Mirror 2].

We also tested the word embeddings model created by Chen et al. (2018) [Paper] [Code].

Supplementary data

Statistics about the datasets, and some prediction files. Available for download as supp.zip [Mirror 1] [Mirror 2].

Reference

If you use this code or data in your work, please cite our publication:

@article{antunes2019a,
  author    = {Antunes, Rui and Matos, S{\'e}rgio},
  journal   = {Database},
  month     = oct,
  number    = {baz095},
  publisher = {{Oxford University Press}},
  title     = {Extraction of chemical--protein interactions from the literature using neural networks and narrow instance representation},
  url       = {https://doi.org/10.1093/database/baz095},
  volume    = {2019},
  year      = {2019},
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src		src
COPYING.txt		COPYING.txt
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BioCreative VI — Track 5: text mining chemical-protein interactions (ChemProt)

Requirements

Usage

Scripts

Datasets

Word embeddings

Supplementary data

Reference

About

Releases

Packages

Languages

License

ruiantunes/biocreative-vi-track-5-chemprot

Folders and files

Latest commit

History

Repository files navigation

BioCreative VI — Track 5: text mining chemical-protein interactions (ChemProt)

Requirements

Usage

Scripts

Datasets

Word embeddings

Supplementary data

Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages