An exploration of text analysis utilizing an academic text on wildfire management.
Wildfire management is a subject that is discussed with increasing frequency. Academic papers provide insights on how policy and management are viewed by the field.
Here I investigate the word frequency and sentiments in the recent paper Forest Service fire management and the elusiveness of change (Schultz et al., 2019) to determine how the authors approach the subject.
# import text from pdf file
schultz_2019 <- pdf_text("Schultz et al_2019_Forest Service fire management and the elusiveness of change.pdf") %>%
data.frame() %>% # make df
rename("text_full" = ".") %>%
mutate(text_full = tolower(text_full))
schultz_tidy <- schultz_2019 %>%
text_full = str_remove_all(text_full, "[[:punct:]]"), # remove punctuation
text_full = str_remove_all(text_full, "[[:digit:]]"),
text_full = str_squish(text_full), # remove interior white space
text_full = str_split(text_full, pattern = " ")) %>% # split into lists at space
unnest(text_full) %>% # make individual rows
rename("word" = "text_full")
When analyzing the word frequency, I removed common stop words. Additionally, I defined words that might be common in the paper but that would not give me a good sense on the ideas in the paper to use in some analyses. These words primarily consisted of the subjects of the paper (fire, management, forest service).
citation <- tribble(~word,
"de", "la", "en", "el", "los", "del",# spanish
"ment", "tion", # frequency captured by prefix
"https", "org", "doi", "et", "al",
"httpsdoiorg", "mp") # citations
#strings to remove based on high frequency as subject of paper
subject <- tribble(~word,
"fire", "management", "managers",
"forest", "service", "policy",
"agency", "wildfire")
schultz_nonstop <- schultz_tidy %>%
anti_join(stop_words) %>% # remove general stop words
anti_join(citation) # remove specific unwanted words
The word cloud gives a sense of the most used words in the paper with “fire” as the most common word (Figure 1).
schultz_counts <- schultz_nonstop %>%
count(word) %>%
slice(-(1:3)) # remove special character and random letters at head
#slice top 100 words
top_100_all <- schultz_counts %>%
slice_max(order_by = n, n = 100)
ggplot(data = top_100_all, aes(label = word)) +
geom_text_wordcloud(aes(color = n, size = n),
shape = "triangle-upright") +
scale_size_area(max_size = 10) +
scale_color_gradientn(colors = c("darkgreen","goldenrod2","firebrick"))+
Figure 1: Word cloud depicting word frequencies in Forest Service fire management (Schultz, et al., 2019).
When subject words are excluded the common words are primarily those that pertain to planning, highlighting that managemenent and planning is key to effecting change in Forest Service fire policy (Figure 2).
# remove high frequency subject words
schultz_nonfire <- anti_join(schultz_nonstop, subject)
schultz_nonfirecounts <- schultz_nonfire %>%
count(word) %>%
slice(-(1:3)) # remove special character and random letters at head
#slice top 10 words
top_10_nonfire <- schultz_nonfirecounts %>%
slice_max(order_by = n, n = 10)
ggplot(data = top_10_nonfire, aes(x = reorder(word, n), y = n)) +
geom_col(aes(fill = n)) +
scale_fill_gradient(high = "firebrick", low = "goldenrod2")+
labs(title = "Common words in Forest Service fire management paper",
y = "Times in text",
caption = "Bri Baker, 2021") +
legend.position = "none",
axis.title.y = element_blank()
Figure 2: Common words in Forest Service fire management (Schultz, et al., 2019). When paper subjects were excluded, most common word is wildland.
I used the Afinn sentiment lexicon to determine how the authors treat the subject of fire management throughout the paper.
When “fire” (coded at -2) is included, the paper has a strong negative overall sentiment. However, when “fire” is excluded, there is a neutral to positive tone to the paper, indicating that the authors may have some optimism for managment (Figure 3).
schultz_afinn_all <- schultz_nonstop %>%
inner_join(get_sentiments("afinn")) %>%
count(value) %>%
rbind(tribble(~value, ~n,
-5, 0,
4, 0,
schultz_afinn_nonfire <- schultz_nonfire%>%
inner_join(get_sentiments("afinn")) %>%
count(value) %>%
rbind(tribble(~value, ~n,
-5, 0,
4, 0,
all_sentiments <- ggplot(data = schultz_afinn_all,
aes(x = value, y = n)) +
geom_col(aes(fill = value)) +
breaks= seq(-5,5,1))+
scale_fill_gradient2(high = "palegreen4",
mid = "goldenrod2",
low = "firebrick")+
labs(title = "All words",
y = "Frequency",
x = "Sentiment score")+
legend.position = "none",
nonfire_sentiments <- ggplot(data = schultz_afinn_nonfire,
aes(x = value, y = n)) +
geom_col(aes(fill = value)) +
breaks= seq(-5,5,1))+
scale_fill_gradient2(high = "palegreen4",
mid = "goldenrod2",
low = "firebrick")+
labs(title = "Sans subject words",
y = "Frequency",
x = "Sentiment score",
caption = "Bri Baker, 2021") +
legend.position = "none",
all_sentiments + nonfire_sentiments
Figure 3: Afinn sentiment analysis of Forest Service fire management (Schultz, et al., 2019). There is a strong negative sentiment that likely comes from the use of the word fire.
Hvitfeldt, Emil (2020). textdata: Download and Load Various Text Datasets. R package version 0.4.1.
Le Pennec, Erwan and Slowikowski, Kamil (2019). ggwordcloud: A Word Cloud Geom for ‘ggplot2’. R package version 0.5.0.
Ooms, Jeroen (2020). pdftools: Text Extraction, Rendering and Converting of PDF Documents. R package version 2.3.1.
Pedersen, Thomas Lin (2020). patchwork: The Composer of Plots. R package version 1.1.1.
Schultz, C. A., Thompson, M. P., & McCaffrey, S. M. (2019). Forest Service fire management and the elusiveness of change. Fire Ecology, 15(1), 13.
Silge J, Robinson D (2016). “tidytext: Text Mining and Analysis Using Tidy Data Principles in R.” JOSS, 1(3). doi: 10.21105/joss.00037 (URL:, <URL:>.
Wickham et al., (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686,