Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sentiment Analysis and CSV File #1

Open
EmilyForden opened this issue Dec 5, 2016 · 4 comments
Open

Sentiment Analysis and CSV File #1

EmilyForden opened this issue Dec 5, 2016 · 4 comments

Comments

@EmilyForden
Copy link
Contributor

I made a very large csv file of all of Livy's works. The first column is the book number, the second column is the chapter number, and the third column is 1-2 paragraphs of text (labels are "Book", "Chapter", "Text"). I want to preserve this structure when performing sentiment analysis (ie. I want to be able to show the sentiment structure for Book 3, Chapter 40 and compare it to Book 7, Chapter 12. What is the best way to run sentiment analysis on column 3 ("Text") while still tying this data to columns one and two in order to preserve the organizational structure?

Thanks!

@jmausolf
Copy link

jmausolf commented Dec 5, 2016

If you want sentiment analysis on each text cell on each row, you could write code to read in that one or two paragraphs as your text, conduct sentiment analysis on that text, and whatever the output of that analysis can be inserted into an "analysis" column (or multiple columns) for that row depending on the output structure.

Work on the different parts. Can you read in a given text for a specific row? Make sure you can do that (even write a function). Once you have that part, have a seniment analysis function that gives results for that text. Figure out how to add this as a new column for that row. Once you have all these parts, simply weite a loop to do the a ove for every row in the csv.

@bensoltoff
Copy link
Contributor

I think that's overcomplicating it. You can read the data in, group by book and chapter columns, then unnest_tokens on the text column. So if the columns are called book, chapter, and text, it should look like this:

read_csv("livy.csv") %>%
  group_by(book, chapter) %>%
  unnest_tokens(word, text)

This keeps columns identifying the book and chapter the word comes from. Then merge with the sentiment dictionary, summarize by chapter, and draw comparisons as necessary. Am I missing something?

@EmilyForden
Copy link
Contributor Author

I'm having some trouble integrating the nrc sentiment analysis to my Livy df. Right now, I have

library(tidyverse)
library(lubridate)
library(stringr)
library(tidytext)
library(broom)
library(scales)

theme_set(theme_bw())
get_sentiments("nrc")

Livy <- read.csv(file="All_Livy.csv",head=TRUE,sep=",")

Livy %>%
  group_by(book, chapter) %>%
  unnest_tokens(word, text) %>%
  mutate(linenumber = row_number(),
         text = cumsum(str_detect(text, regex("^chapter [\\divxlc]",
                                                 ignore_case = TRUE)))) %>%
  ungroup() %>%
  unnest_tokens(word, text)
head(Livy)

data("stop_words")
cleaned_livy <- Livy %>%
  anti_join(stop_words)

cleaned_livy %>%
  count(word, sort = TRUE) 

LivySentiment <- get_sentiments("nrc") %>%
  semi_join(cleaned_livy) %>%
  count(word, sort = TRUE)

This is based on a tutorial I found, but in the tutorial, only one sentiment was shown:

nrcjoy <- get_sentiments("nrc") %>%
  filter(sentiment == "joy")

tidy_books %>%
  filter(book == "Emma") %>%
  semi_join(nrcjoy) %>%
  count(word, sort = TRUE)

I can't seem to be able to find out the count of each sentiment for each chapter. Any thoughts? I think my join might be the problem but I don't understand how to fix it.

Once I get that figured out, I would like to make a graph that shows the amount of each sentiment in each chapter so that you can look at the changes throughout the book. I want to eventually make this into a UI that a participant can play with.

@bensoltoff
Copy link
Contributor

For multiple sentiments, don't filter with semi_join - just use inner_join. Remember what all the joins do.

LivySentiment <- cleaned_livy %>%
  inner_join(get_sentiments("nrc"))

Note that you also need to clean up the code in the earlier part of the script. Your columns have different names than the example scripts (book -> Book, chapter -> Chapter, etc.)

library(tidyverse)
library(lubridate)
library(stringr)
library(tidytext)
library(broom)
library(scales)

theme_set(theme_bw())

# get text
Livy <- read_csv("All_Livy.csv")

# convert to tokens
Livy <- Livy %>%
  group_by(Book, Chapter) %>%
  unnest_tokens(word, Text) %>%
  mutate(linenumber = row_number())
Livy

# remove stop words
cleaned_livy <- Livy %>%
  anti_join(stop_words)

cleaned_livy %>%
  count(word, sort = TRUE) 

# add sentiment from NRC dictionary
LivySentiment <- cleaned_livy %>%
  inner_join(get_sentiments("nrc"))

# summarize sentiment by chapter
LivySentiment %>%
  count(Book, Chapter, sentiment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants