Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with sentiment analysis #1

Open
EmilyForden opened this issue Nov 28, 2016 · 1 comment
Open

Issue with sentiment analysis #1

EmilyForden opened this issue Nov 28, 2016 · 1 comment

Comments

@EmilyForden
Copy link

I'm running into 2 issues while trying to run sentiment analysis on the ancient writer Livy. The first is that Livy's book is divided into 4 sections, which must be downloaded independently. I can only get part 1 and 4 to download.

The second problem is causing me more distress. I'm trying to run sentiment analysis on the texts using nrc but I keep being thrown an issue with the 'by' argument. I think this is because my inner-join is incorrect. Is my mutate line incorrect? I think it might be since nrc isn't simply a positive-negative sentiment but a multi-faceted analysis.

Thanks!

titles <- c("The History of Rome, Books 01 to 08", "The History of Rome, Books 09 to 26",
            "The History of Rome, Books 27 to 36", "The History of Rome, Books 37 to the End
            with the Epitomes and Fragments of the Lost Books")

books <- gutenberg_works(title %in% titles) %>%
  gutenberg_download(meta_fields = "title")

get_sentiments("nrc") %>%
  count(sentiment)

tidy_books <- books() %>%
  group_by(book) %>%
  mutate(linenumber = row_number(),
         chapter = cumsum(str_detect(text, regex("^chapter [\\divxlc]", 
                                                 ignore_case = TRUE)))) %>%
  ungroup() %>%
  unnest_tokens(word, text)

Livysentiment <- books %>%
  inner_join(get_sentiments("nrc")) %>%
  count(book, index = linenumber %/% 80, sentiment) %>%
  spread(sentiment, n, fill = 0) %>%
  mutate(sentiment = positive - negative)
@bensoltoff
Copy link
Contributor

For the first problem, try using

books <- gutenberg_works(author == "Livy") %>%
  gutenberg_download(meta_fields = "title")

This downloads all books by Livy. There are 5 in total - the 4 you want plus "Roman History, Books I-III" (filter this out of books if you don't want it). The way the other two books' titles are actually stored is weird and doesn't perfectly match your string, hence go with my method.

For the second problem, you're getting sloppy with reusing code. For tidy_books, in the first line books is a data frame, not a function. You don't need to create chapter because the book isn't structured in chapters (or if it is, you need to change the regular expression to properly detect it). Then for Livysentiment, you don't want to start the piped operation with books - you want the already tokenized version stored in tidy_books. You cannot merge the sentiment dictionary with the original text because that is stored by line, not by token.

Fix those issues and the code should work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants