Skip to content

A bioinformatics tool for prioritising biological leads from omics data using literature mining.

License

Notifications You must be signed in to change notification settings

Sydney-Informatics-Hub/OmixLitMiner

 
 

Repository files navigation

OmixLitMiner

OmixLitMiner is a new tool that aims to help researchers reduce time spent on literature research post analysis and streamline the decision about which proteins or genes are the most interesting and most promising for follow-up experiments.

The goal of OmixLitMiner is to streamline the process of literature retrieval and provides result categorisation to assist researchers select appropraite leads for further research. The algorithm makes use of a ranking system as detailed below -

Ranking system

The tool assigns the proteins into three main categories (1-3) and an additional Category 0. Category 1 hits are proteins/genes, which show at least one review paper where the synonyms and the selected keywords are found together in the article title, or in the abstract if that option is selected. Category 2 hits are proteins/genes where at least one publication was found, but no review article, in which the synonyms and the selected keywords are both present. Category 3 represents proteins/genes where no publication was found which mentions both the synonyms and the keywords together in the title. Category 0 is used for proteins/genes where the tool could not find any synonyms. This may happen, if the UniProt ID belongs to an isoform or to an entry that is unreviewed (i.e. TrEMBL).

The wordclouds that are produced by the algorithm is the frequency of words in the abstracts of each search query.

Installation

You can install the released version of OmixLitMiner from SIH-GIT with:

install.packages("devtools") # only if devtools is not installed
devtools::install_github("Sydney-Informatics-Hub/OmixLitMiner")

If RTools is not installed prior to downloading OmixLitMiner, you will be prompted to install it. Please do so. If you are not prompted to install it, please dowload and install it from here: RTools. After installation of RTools is done run devtools::install_github("Sydney-Informatics-Hub/OmixLitMiner") again so that the package is downloaded.

Example

Some ways of using the OmixLitMiner package is shown below. potentialmarker is a R dataframe that is part of the R package, for description of its contents, run the following R command

library(OmixLitMiner)
?potentialmarker

Ex.1. Using the R data frame provided by the package, no output spreadsheet and plots specified, the object returned from omixLitMiner() is not assigned to any variable.

library(OmixLitMiner)
result <- omixLitMiner(potentialmarker)

The result variable has 2 list elements -

  1. summary_results - Summarizes the query results
  2. pubmed_results - Summarizes the PubMed results based on the UniProt Identifiers and key words specified by the user

Ex.2. Using the R data frame provided by the package, with output spreadsheet specifed, the object returned from omixLitMiner() is not assigned to any variable.

library(OmixLitMiner)
omixLitMiner(potentialmarker)

Ex.3. Using the R data frame provided by the package, with output spreadsheet specifed.

library(OmixLitMiner)
omixLitMiner(potentialmarker, output.file = "potential_marker_pubmed_results.xlsx")

The output spread sheet with the PubMed output as well as the plots will be saved in the current working directory.

Ex.4. Using the R data frame provided by the package, with output spreadsheet specifed, the object returned from omixLitMiner() is not assigned to any variable.

library(OmixLitMiner)
omixLitMiner(potentialmarker, output.file = "potential_marker_pubmed_results.xlsx", plots.dir = "plots")

The output spread sheet with the PubMed output will be saved in the current working directory. If an output spread sheet existed, it would be overwritten. The images generated by the package will be saved in directory "plots" in the current working directory. If no plots directory was present, a new plots directory would be created.

Ex.5. Reading from an Excel and converting it to a R dataframe. The Input_uniprot_Keywords.xlsx is assumed to be present at the current working directory.

library(OmixLitMiner)
library(openxlsx)
df <- readWorkbook("Input_uniprot_Keywords.xlsx")# how to read an excel file on your computer
# df <- read.csv("path/to/my/input_query.csv", stringsAsFactors = F)     # how to read a csv file on your computer
result <- omixLitMiner(df, output.file = "input_uniprot_keywords_pubmed_results.xlsx", plots.dir = "plots")

Ex.6. Reading an Excel, reading default Excel input that is provided by OmixLitMiner, and converting it to a R dataframe

library(OmixLitMiner)
library(openxlsx)

# Read in input query excel file
df <- readWorkbook(system.file("extdata", "input_uniprot_keywords.xlsx", package="OmixLitMiner")) #read demo data from package

# Query UniProt and PubMed and Return Results
result <- omixLitMiner(df, output.file = "input_uniprot_keywords_pubmed_results.xlsx", plots.dir = "plots")

A Standard Operating Procedure guide for installing R and running OmixLitMiner is available at this link

Searching with gene names instead of Uniprot Identifiers

For searching with gene names instead of UniProt identifiers please change the value in the column IDType in the input file to Gene instead of Accession.

Citing

When using OmixLitMiner please cite: Steffen P, Wu J, Hariharan S, Molloy MP, Schluter H, OmixLitMiner A bioinformatics tool for prioritizing biological leads from omics data using literature mining.

About

A bioinformatics tool for prioritising biological leads from omics data using literature mining.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 98.4%
  • R 1.6%