Issue with sentiment analysis #1

EmilyForden · 2016-11-28T16:06:55Z

I'm running into 2 issues while trying to run sentiment analysis on the ancient writer Livy. The first is that Livy's book is divided into 4 sections, which must be downloaded independently. I can only get part 1 and 4 to download.

The second problem is causing me more distress. I'm trying to run sentiment analysis on the texts using nrc but I keep being thrown an issue with the 'by' argument. I think this is because my inner-join is incorrect. Is my mutate line incorrect? I think it might be since nrc isn't simply a positive-negative sentiment but a multi-faceted analysis.

Thanks!

titles <- c("The History of Rome, Books 01 to 08", "The History of Rome, Books 09 to 26",
            "The History of Rome, Books 27 to 36", "The History of Rome, Books 37 to the End
            with the Epitomes and Fragments of the Lost Books")

books <- gutenberg_works(title %in% titles) %>%
  gutenberg_download(meta_fields = "title")

get_sentiments("nrc") %>%
  count(sentiment)

tidy_books <- books() %>%
  group_by(book) %>%
  mutate(linenumber = row_number(),
         chapter = cumsum(str_detect(text, regex("^chapter [\\divxlc]", 
                                                 ignore_case = TRUE)))) %>%
  ungroup() %>%
  unnest_tokens(word, text)

Livysentiment <- books %>%
  inner_join(get_sentiments("nrc")) %>%
  count(book, index = linenumber %/% 80, sentiment) %>%
  spread(sentiment, n, fill = 0) %>%
  mutate(sentiment = positive - negative)

bensoltoff · 2016-11-28T21:11:31Z

For the first problem, try using

books <- gutenberg_works(author == "Livy") %>%
  gutenberg_download(meta_fields = "title")

This downloads all books by Livy. There are 5 in total - the 4 you want plus "Roman History, Books I-III" (filter this out of books if you don't want it). The way the other two books' titles are actually stored is weird and doesn't perfectly match your string, hence go with my method.

For the second problem, you're getting sloppy with reusing code. For tidy_books, in the first line books is a data frame, not a function. You don't need to create chapter because the book isn't structured in chapters (or if it is, you need to change the regular expression to properly detect it). Then for Livysentiment, you don't want to start the piped operation with books - you want the already tokenized version stored in tidy_books. You cannot merge the sentiment dictionary with the original text because that is stored by line, not by token.

Fix those issues and the code should work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with sentiment analysis #1

Issue with sentiment analysis #1

EmilyForden commented Nov 28, 2016

bensoltoff commented Nov 28, 2016

Issue with sentiment analysis #1

Issue with sentiment analysis #1

Comments

EmilyForden commented Nov 28, 2016

bensoltoff commented Nov 28, 2016