-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sentiment Analysis and CSV File #1
Comments
If you want sentiment analysis on each text cell on each row, you could write code to read in that one or two paragraphs as your text, conduct sentiment analysis on that text, and whatever the output of that analysis can be inserted into an "analysis" column (or multiple columns) for that row depending on the output structure. Work on the different parts. Can you read in a given text for a specific row? Make sure you can do that (even write a function). Once you have that part, have a seniment analysis function that gives results for that text. Figure out how to add this as a new column for that row. Once you have all these parts, simply weite a loop to do the a ove for every row in the csv. |
I think that's overcomplicating it. You can read the data in, group by book and chapter columns, then read_csv("livy.csv") %>%
group_by(book, chapter) %>%
unnest_tokens(word, text) This keeps columns identifying the book and chapter the word comes from. Then merge with the sentiment dictionary, summarize by chapter, and draw comparisons as necessary. Am I missing something? |
I'm having some trouble integrating the nrc sentiment analysis to my Livy df. Right now, I have library(tidyverse)
library(lubridate)
library(stringr)
library(tidytext)
library(broom)
library(scales)
theme_set(theme_bw())
get_sentiments("nrc")
Livy <- read.csv(file="All_Livy.csv",head=TRUE,sep=",")
Livy %>%
group_by(book, chapter) %>%
unnest_tokens(word, text) %>%
mutate(linenumber = row_number(),
text = cumsum(str_detect(text, regex("^chapter [\\divxlc]",
ignore_case = TRUE)))) %>%
ungroup() %>%
unnest_tokens(word, text)
head(Livy)
data("stop_words")
cleaned_livy <- Livy %>%
anti_join(stop_words)
cleaned_livy %>%
count(word, sort = TRUE)
LivySentiment <- get_sentiments("nrc") %>%
semi_join(cleaned_livy) %>%
count(word, sort = TRUE)
This is based on a tutorial I found, but in the tutorial, only one sentiment was shown: nrcjoy <- get_sentiments("nrc") %>%
filter(sentiment == "joy")
tidy_books %>%
filter(book == "Emma") %>%
semi_join(nrcjoy) %>%
count(word, sort = TRUE) I can't seem to be able to find out the count of each sentiment for each chapter. Any thoughts? I think my join might be the problem but I don't understand how to fix it. Once I get that figured out, I would like to make a graph that shows the amount of each sentiment in each chapter so that you can look at the changes throughout the book. I want to eventually make this into a UI that a participant can play with. |
For multiple sentiments, don't filter with LivySentiment <- cleaned_livy %>%
inner_join(get_sentiments("nrc")) Note that you also need to clean up the code in the earlier part of the script. Your columns have different names than the example scripts ( library(tidyverse)
library(lubridate)
library(stringr)
library(tidytext)
library(broom)
library(scales)
theme_set(theme_bw())
# get text
Livy <- read_csv("All_Livy.csv")
# convert to tokens
Livy <- Livy %>%
group_by(Book, Chapter) %>%
unnest_tokens(word, Text) %>%
mutate(linenumber = row_number())
Livy
# remove stop words
cleaned_livy <- Livy %>%
anti_join(stop_words)
cleaned_livy %>%
count(word, sort = TRUE)
# add sentiment from NRC dictionary
LivySentiment <- cleaned_livy %>%
inner_join(get_sentiments("nrc"))
# summarize sentiment by chapter
LivySentiment %>%
count(Book, Chapter, sentiment) |
I made a very large csv file of all of Livy's works. The first column is the book number, the second column is the chapter number, and the third column is 1-2 paragraphs of text (labels are "Book", "Chapter", "Text"). I want to preserve this structure when performing sentiment analysis (ie. I want to be able to show the sentiment structure for Book 3, Chapter 40 and compare it to Book 7, Chapter 12. What is the best way to run sentiment analysis on column 3 ("Text") while still tying this data to columns one and two in order to preserve the organizational structure?
Thanks!
The text was updated successfully, but these errors were encountered: