library(quanteda) #install with install.packages("quanteda") if needed
data(data_corpus_inaugural) speeches <- data_corpus_inaugural$documents row.names(speeches) <- NULL
As you can see, this dataset has each Inaugural Address in a column called "texts," with year and President's name as additional variables. To analyze the words in the speeches, and generate a wordcloud, we'll want to unnest the words in the texts column.
library(tidytext) library(tidyverse)
speeches_tidy <- speeches %>% unnest_tokens(word, texts) %>% anti_join(stop_words)
For our first wordcloud, let's see what are the most common words across all speeches.
library(wordcloud) #install.packages("wordcloud") if needed
speeches_tidy %>% count(word, sort = TRUE) %>% with(wordcloud(word, n, max.words = 50))
We could very easily create a wordcloud for one President specifically. For instance, let's create one for Obama, since he provides us with two speeches worth of words. But to take things up a notch, let's add sentiment information to our wordcloud. To do that, we'll use the comparison.cloud function; we'll also need the reshape2 library.
library(reshape2) #install.packages("reshape2") if needed
obama_words <- speeches_tidy %>% filter(President == "Obama") %>% count(word, sort = TRUE) obama_words %>% inner_join(get_sentiments("nrc") %>% filter(sentiment %in% c("positive", "negative"))) %>% filter(n > 1) %>% acast(word ~ sentiment, value.var = "n", fill = 0) %>% comparison.cloud(colors = c("red","blue"))
Interestingly, the NRC classifies "government" and "words" as negative. But even if we ignore those two words, which are Obama's most frequent, the negatively-valenced words are much larger than most of his positively-valenced words. So while he uses many more positively-valenced words than negatively-valenced words - seen by the sheer number of blue words - he uses the negatively-valenced words more often. If you were so inclined, you could probably run a sentiment analysis on his speeches and see if they tend to be more positive or negative, and/or if they follow arcs of negativity and positivity. And feel free to generate your own wordcloud: all you'd need to do is change the filter(President == "") to whatever President you're interested in examining (or whatever text data you'd like to use, if President's speeches aren't your thing).
No comments:
Post a Comment