OmixLitMiner

OmixLitMiner is a new tool that aims to help researchers reduce time spent on literature research post analysis and streamline the decision about which proteins or genes are the most interesting and most promising for follow-up experiments.

The goal of OmixLitMiner is to streamline the process of literature retrieval and provides result categorisation to assist researchers select appropraite leads for further research. The algorithm makes use of a ranking system as detailed below -

Ranking system

The tool assigns the proteins into three main categories (1-3) and an additional Category 0. Category 1 hits are proteins/genes, which show at least one review paper where the synonyms and the selected keywords are found together in the article title, or in the abstract if that option is selected. Category 2 hits are proteins/genes where at least one publication was found, but no review article, in which the synonyms and the selected keywords are both present. Category 3 represents proteins/genes where no publication was found which mentions both the synonyms and the keywords together in the title. Category 0 is used for proteins/genes where the tool could not find any synonyms. This may happen, if the UniProt ID belongs to an isoform or to an entry that is unreviewed (i.e. TrEMBL).

The wordclouds that are produced by the algorithm is the frequency of words in the abstracts of each search query.

Installation

You can install the released version of OmixLitMiner from SIH-GIT with:

install.packages("devtools") # only if devtools is not installed
devtools::install_github("Sydney-Informatics-Hub/OmixLitMiner")

If RTools is not installed prior to downloading OmixLitMiner, you will be prompted to install it. Please do so. If you are not prompted to install it, please dowload and install it from here: RTools. After installation of RTools is done run devtools::install_github("Sydney-Informatics-Hub/OmixLitMiner") again so that the package is downloaded.

Example

Some ways of using the OmixLitMiner package is shown below. potentialmarker is a R dataframe that is part of the R package, for description of its contents, run the following R command

library(OmixLitMiner)
?potentialmarker

Ex.1. Using the R data frame provided by the package, no output spreadsheet and plots specified, the object returned from omixLitMiner() is not assigned to any variable.

library(OmixLitMiner)
result <- omixLitMiner(potentialmarker)

The result variable has 2 list elements -

summary_results - Summarizes the query results
pubmed_results - Summarizes the PubMed results based on the UniProt Identifiers and key words specified by the user

Ex.2. Using the R data frame provided by the package, with output spreadsheet specifed, the object returned from omixLitMiner() is not assigned to any variable.

library(OmixLitMiner)
omixLitMiner(potentialmarker)

Ex.3. Using the R data frame provided by the package, with output spreadsheet specifed.

library(OmixLitMiner)
omixLitMiner(potentialmarker, output.file = "potential_marker_pubmed_results.xlsx")

The output spread sheet with the PubMed output as well as the plots will be saved in the current working directory.

Ex.4. Using the R data frame provided by the package, with output spreadsheet specifed, the object returned from omixLitMiner() is not assigned to any variable.

library(OmixLitMiner)
omixLitMiner(potentialmarker, output.file = "potential_marker_pubmed_results.xlsx", plots.dir = "plots")

The output spread sheet with the PubMed output will be saved in the current working directory. If an output spread sheet existed, it would be overwritten. The images generated by the package will be saved in directory "plots" in the current working directory. If no plots directory was present, a new plots directory would be created.

Ex.5. Reading from an Excel and converting it to a R dataframe. The Input_uniprot_Keywords.xlsx is assumed to be present at the current working directory.

library(OmixLitMiner)
library(openxlsx)
df <- readWorkbook("Input_uniprot_Keywords.xlsx")# how to read an excel file on your computer
# df <- read.csv("path/to/my/input_query.csv", stringsAsFactors = F)     # how to read a csv file on your computer
result <- omixLitMiner(df, output.file = "input_uniprot_keywords_pubmed_results.xlsx", plots.dir = "plots")

Ex.6. Reading an Excel, reading default Excel input that is provided by OmixLitMiner, and converting it to a R dataframe

library(OmixLitMiner)
library(openxlsx)

# Read in input query excel file
df <- readWorkbook(system.file("extdata", "input_uniprot_keywords.xlsx", package="OmixLitMiner")) #read demo data from package

# Query UniProt and PubMed and Return Results
result <- omixLitMiner(df, output.file = "input_uniprot_keywords_pubmed_results.xlsx", plots.dir = "plots")

A Standard Operating Procedure guide for installing R and running OmixLitMiner is available at this link

Searching with gene names instead of Uniprot Identifiers

For searching with gene names instead of UniProt identifiers please change the value in the column IDType in the input file to Gene instead of Accession.

Citing

When using OmixLitMiner please cite: Steffen P, Wu J, Hariharan S, Molloy MP, Schluter H, OmixLitMiner A bioinformatics tool for prioritizing biological leads from omics data using literature mining.

Name		Name	Last commit message	Last commit date
Latest commit History 123 Commits
R		R
data		data
inst/extdata		inst/extdata
man		man
src		src
tests		tests
vignettes		vignettes
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
OmixLitMiner SOP.pdf		OmixLitMiner SOP.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OmixLitMiner

Ranking system

Installation

Example

Searching with gene names instead of Uniprot Identifiers

Citing

About

Releases

Packages

Languages

License

Sydney-Informatics-Hub/OmixLitMiner

Folders and files

Latest commit

History

Repository files navigation

OmixLitMiner

Ranking system

Installation

Example

Searching with gene names instead of Uniprot Identifiers

Citing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages