Sentiment Analysis and CSV File #1

EmilyForden · 2016-12-05T16:31:24Z

I made a very large csv file of all of Livy's works. The first column is the book number, the second column is the chapter number, and the third column is 1-2 paragraphs of text (labels are "Book", "Chapter", "Text"). I want to preserve this structure when performing sentiment analysis (ie. I want to be able to show the sentiment structure for Book 3, Chapter 40 and compare it to Book 7, Chapter 12. What is the best way to run sentiment analysis on column 3 ("Text") while still tying this data to columns one and two in order to preserve the organizational structure?

Thanks!

jmausolf · 2016-12-05T16:43:14Z

If you want sentiment analysis on each text cell on each row, you could write code to read in that one or two paragraphs as your text, conduct sentiment analysis on that text, and whatever the output of that analysis can be inserted into an "analysis" column (or multiple columns) for that row depending on the output structure.

Work on the different parts. Can you read in a given text for a specific row? Make sure you can do that (even write a function). Once you have that part, have a seniment analysis function that gives results for that text. Figure out how to add this as a new column for that row. Once you have all these parts, simply weite a loop to do the a ove for every row in the csv.

bensoltoff · 2016-12-05T16:46:49Z

I think that's overcomplicating it. You can read the data in, group by book and chapter columns, then unnest_tokens on the text column. So if the columns are called book, chapter, and text, it should look like this:

read_csv("livy.csv") %>%
  group_by(book, chapter) %>%
  unnest_tokens(word, text)

This keeps columns identifying the book and chapter the word comes from. Then merge with the sentiment dictionary, summarize by chapter, and draw comparisons as necessary. Am I missing something?

EmilyForden · 2016-12-05T20:55:17Z

I'm having some trouble integrating the nrc sentiment analysis to my Livy df. Right now, I have

library(tidyverse)
library(lubridate)
library(stringr)
library(tidytext)
library(broom)
library(scales)

theme_set(theme_bw())
get_sentiments("nrc")

Livy <- read.csv(file="All_Livy.csv",head=TRUE,sep=",")

Livy %>%
  group_by(book, chapter) %>%
  unnest_tokens(word, text) %>%
  mutate(linenumber = row_number(),
         text = cumsum(str_detect(text, regex("^chapter [\\divxlc]",
                                                 ignore_case = TRUE)))) %>%
  ungroup() %>%
  unnest_tokens(word, text)
head(Livy)

data("stop_words")
cleaned_livy <- Livy %>%
  anti_join(stop_words)

cleaned_livy %>%
  count(word, sort = TRUE) 

LivySentiment <- get_sentiments("nrc") %>%
  semi_join(cleaned_livy) %>%
  count(word, sort = TRUE)

This is based on a tutorial I found, but in the tutorial, only one sentiment was shown:

nrcjoy <- get_sentiments("nrc") %>%
  filter(sentiment == "joy")

tidy_books %>%
  filter(book == "Emma") %>%
  semi_join(nrcjoy) %>%
  count(word, sort = TRUE)

I can't seem to be able to find out the count of each sentiment for each chapter. Any thoughts? I think my join might be the problem but I don't understand how to fix it.

Once I get that figured out, I would like to make a graph that shows the amount of each sentiment in each chapter so that you can look at the changes throughout the book. I want to eventually make this into a UI that a participant can play with.

bensoltoff · 2016-12-05T21:41:00Z

For multiple sentiments, don't filter with semi_join - just use inner_join. Remember what all the joins do.

LivySentiment <- cleaned_livy %>%
  inner_join(get_sentiments("nrc"))

Note that you also need to clean up the code in the earlier part of the script. Your columns have different names than the example scripts (book -> Book, chapter -> Chapter, etc.)

library(tidyverse)
library(lubridate)
library(stringr)
library(tidytext)
library(broom)
library(scales)

theme_set(theme_bw())

# get text
Livy <- read_csv("All_Livy.csv")

# convert to tokens
Livy <- Livy %>%
  group_by(Book, Chapter) %>%
  unnest_tokens(word, Text) %>%
  mutate(linenumber = row_number())
Livy

# remove stop words
cleaned_livy <- Livy %>%
  anti_join(stop_words)

cleaned_livy %>%
  count(word, sort = TRUE) 

# add sentiment from NRC dictionary
LivySentiment <- cleaned_livy %>%
  inner_join(get_sentiments("nrc"))

# summarize sentiment by chapter
LivySentiment %>%
  count(Book, Chapter, sentiment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sentiment Analysis and CSV File #1

Sentiment Analysis and CSV File #1

EmilyForden commented Dec 5, 2016

jmausolf commented Dec 5, 2016

bensoltoff commented Dec 5, 2016

EmilyForden commented Dec 5, 2016

bensoltoff commented Dec 5, 2016

Sentiment Analysis and CSV File #1

Sentiment Analysis and CSV File #1

Comments

EmilyForden commented Dec 5, 2016

jmausolf commented Dec 5, 2016

bensoltoff commented Dec 5, 2016

EmilyForden commented Dec 5, 2016

bensoltoff commented Dec 5, 2016