The package aims to extract data from the citations downloaded from the citations manager of bioRxiv, which is a pre-print server for biology research
The following packages must be installed before hand for the program to work
- stringr
- dplyr
- stringi
- tm
- SnowballC
- wordcloud
- RColorBrewer
- NLP
- topicmodels
- tidytext
- reshape2
- ggplot2
- pals
- Rcpp
- igraph
This could be done by running:
install.packages("stringr", "dplyr", "stringi", "tm" , "SnowballC", "wordcloud", "RColorBrewer", "NLP", "topicmodels", "tidytext", "reshape2", "ggplot2", "Rcpp", "igraph")
Use the extractfunction by passing in the name of the file in form of the string.
df <- extract("citations.txt")
It will save everything into the assigned variable
dtm <- calculatedtm(df$Abstract)
Pass in the Abstract coloumn from the dataframe you created to calculate the DTM with common words removed.
freqtable <- calculatefreq(dtm)
To make a frequency table pass in the dtm found before and into the function
Frequncy table made in the previous function has been used for this
makewordcloud(freqtable)
makebarplot(freqtable)
Just pass in the abstract coloumn and the function will do the job with topics = 10
maketopicmodel(df$Abstract)
To create K number of topics pass in
topics <- createtopics(dfn$Abstract, K)
K is set to 10 by default
Pass in the topics, in the funtion to make network of linked topics
text_link(topics)