Skip to content

Latest commit

 

History

History
107 lines (82 loc) · 2.9 KB

README.md

File metadata and controls

107 lines (82 loc) · 2.9 KB

Browse LDA Topic Models

This package allows you to create a set of HTML files to browse a topic model. It creates a word cloud and time-graph per topic, and annotates a selection of documents with the topic for each word.

See the the example for a collection of State of the Union addresses.

Installing

if (!require(devtools)) {install.packages("devtools"); library(devtools)}
install_github("vanatteveldt/topicbrowser")
library(topicbrowser)

Creating a topic browser

To create a topic browser, you need to have:

  • A model fit using topicmodels::LDA
  • The set of original tokens used to create the document term matrix, and the document ids these tokens are from
  • The metadata of the documents, containing aid, headline, and date

The provided data file 'sotu' contains this data from the state of the union addresses. Make sure that the tokens are ordered in the way they appeared in the article

data(sotu)
tokens = tokens[order(tokens$aid, tokens$id), ]
class(m)
## [1] "LDA_Gibbs"
## attr(,"package")
## [1] "topicmodels"
head(tokens)
##          aid      lemma       word sentence  pos offset id pos1 freq
## 20 111541965         it         It        1  PRP      0  1    O    1
## 10 111541965         be         is        1  VBZ      3  2    V    1
## 40 111541965         we        our        1 PRP$      6  3    O    1
## 39 111541965 unfinished unfinished        1   JJ     10  4    A    1
## 32 111541965       task       task        1   NN     21  5    N    1
## 38 111541965         to         to        1   TO     26  6    ?    1
head(meta)
##          id       date   medium     headline
## 1 111541965 2013-02-12 Speeches Barack Obama
## 2 111541995 2013-02-12 Speeches Barack Obama
## 3 111542001 2013-02-12 Speeches Barack Obama
## 4 111542006 2013-02-12 Speeches Barack Obama
## 5 111542013 2013-02-12 Speeches Barack Obama
## 6 111542018 2013-02-12 Speeches Barack Obama

With these data, you can create a topic browser as follows:

output = createTopicBrowser(m, tokens$lemma, tokens$aid, words=tokens$word, meta=meta)
## Writing html to /tmp/Rtmp7o5E48/topicbrowser_3f047fbf0d1e.html
## Preparing variables
## Rendering overview
## Rendering topic 1
## Rendering topic 2
## Rendering topic 3
## Rendering topic 4
## Rendering topic 5
## Rendering topic 6
## Rendering topic 7
## Rendering topic 8
## Rendering topic 9
## Rendering topic 10
## HTML written to /tmp/Rtmp7o5E48/topicbrowser_3f047fbf0d1e.html

You can also publish the output file directly using markdown::rpubsupload:

library(markdown)
result = rpubsUpload("Example topic browser", output)
browseURL(result$continueUrl)

This produces the results shown in the the example