For #tidytuesday we’re looking at Amusement Park injuries. I plan on making a simple visual of the number of injuries by month.
if (!require(pacman)) {install.packages('pacman')}
p_load(janitor, skimr, stringr, tidyverse, lubridate)Previous inspection of the raw data shows that some NA values are denoted other strings such as “n/a” or “#########”. This does not get picked up as NA in the default condition so must me manually listed.
#Split over multiple lines for legibility
data_url <- paste0("https://raw.githubusercontent.com/rfordatascience/",
"tidytuesday/master/data/2019/2019-09-10/",
"tx_injuries.csv")
#Define observed N/A types
na_list <- c("NA", "n/a", "#########", "N/A", "na")
#Import Data
tx_injuries <- readr::read_csv(file = data_url, na = na_list)There are two date formats used in the data set. One date has a “M/D/Y” format. The other date is represented as a serial number. Both are character strings. To covert the dates to a consistent format and a date object the following steps were taken.
# Consolidate Date Types / Drop Missing Dates
tx_injuries <- tx_injuries %>%
# Drop N/A Injury dates
drop_na(injury_date) %>%
# Unify date type
mutate(injury_date_conv = if_else(
# Check if date uses "/"
grepl(pattern = "/",x = injury_date),
# Converts M-D-Y dates
mdy(injury_date),
# Converts Serial dates
excel_numeric_to_date(as.numeric(injury_date)
, date_system = "modern")
)
)With a new column with each injury date as a date object, we then sum the number of injuries each month, using group_by with both year and month. For the final visual a dummy day column is added, with date of 1. This day column will be used to create another date object. To create the date object a string is generated by concatenating the year, month, and day columns into a new single column, and then converting this full date string into a date object again using the mdy() function from lubridate.
# Data Frame Development
tx_injuries <- tx_injuries %>%
mutate(month = month(injury_date_conv),
year = year(injury_date_conv)) %>%
group_by(year, month) %>%
summarise(injuries = n()) %>%
mutate(day = 1,
eff_date_char = paste(year,month,day, sep = "-"),
eff_date = ymd(eff_date_char)) %>%
select(-eff_date_char)Now the injuries recorded each month can be plotted. Clear seasonal activity, which probably tracks against total visits.
#Visual
ggplot(data = tx_injuries
, mapping = aes( x = eff_date, y = injuries)) +
geom_col(fill = "#1F618D", alpha = 0.75) +
scale_x_date(
date_labels = "%Y",
breaks = "1 year") +
labs(title = "Number of Injuries at Amusement Parks, By Month"
, caption = "Data by Data.world | #TidyTuesday") +
ylab("Injuries") +
xlab("Year") +
theme_minimal() +
theme(axis.text.x = element_text(hjust=-1.6))