This series features mid-course projects for our inaugural Data Science Bootcamp. Students were tasked with asking an interesting data question, finding a dataset to answer the data question, cleaning, wrangling, and exploring the data, then designing and building an interactive Shiny app.
We often focus our attention on events with numerous fatalities, but across the US, road traffic crashes (RTCs) are claiming lives on a daily basis. In his research for his mid-course capstone, Dereje Getu Demissie discovered some alarming statistics.
- Fatalities from RTCs are responsible for more years of life lost than most human diseases (Petridou & Moustaki, 2000).
- RTCs claim 40-50,000 lives per year in the US; which is comparable to the number of Americans that died in the Vietnam or Korean wars (Levitt & Porter, 2001).
- On average, more people die from RTCs on a monthly basis than the September 11th terrorist attack (Evans, 2002).
- You are four times more likely to be in an accident if using your phone while driving (McEvoy et al., 2005, cited in Abouk & Adams, 2013).
- You are 23 times more likely to be in a crash or near-crash if you’re texting while driving (VTTI, 2009, cited in Abouk & Adams, 2013).
- In 2009, over 25,000 fatalities were estimated to be caused by a driver distracted by their cell phone (National Highway Traffic Safety Administration - NHTSA).
In 2013, Bhargava & Pathania released a study that argued cell phones may not actually increase the number of RTCs. Their research explained that drivers who use cell phones compensate for their distraction by reducing speed, moving to uncongested lanes, or heightening their attention.
But it’s not just human factors that impact fatalities on the road. As more Americans choose to drive larger, heavier vehicles, there is an increase fatality risk, up to 40-50%, when the larger vehicle is involved in a crash with a smaller vehicle (Anderson & Auffhammer, 2014).
Dereje was motivated by these statistics to explore other factors that could lead to so many RTC fatalities.
The Data Question
Using the NHTSA’s Fatality Analysis Reporting System (FARS), which provides data about fatal crashes, Dereje wanted to see if there were any correlations between driver characteristics, vehicle characteristics, and environmental factors in RTCs. He identified the following characteristics to examine.
- Driver characteristics
- Driving under the influence
- Environmental and road characteristics
- Weather conditions
- Road Types
- Time of accident
- Vehicle characteristics
- Types of vehicles
Due to time constraints, he limited his data to 2016.
Cleaning The Data
Dereje used the R packages dplyr and ggplot to wrangle and clean the data. Additionally, he used filter, select, mutate, arrange, and summarize. He shared, “merging different data sets and labeling them was one of the biggest data wrangling tasks that I have gone through.”
Visualizing The Data
Dereje’s Shiny app, US Traffic Fatality Analysis in 2016, allows users to interact and visualize the causes of traffic fatalities in the US, as identified above.
To create his visuals, he used ggplot’s histogram and bar plots to visualize continuous and discrete variables respectively. Using ggmap’s open street map, he created a visual to show the traffic fatality locations in Tennessee.
Dereje highlighted four things that he learned from the data (and taught me a new phrase in Latin).
Ceteris paribus (with other conditions remaining the same):
- Most fatal RTCs are caused by male drivers.
- Most fatal RTCs occur in daylight from 4:00 pm to 7:00 pm.
- There is a higher rate of RTCs Friday, Saturday, and Sunday.
- Speed-involved and alcohol-involved RTCs are the primary causes of fatalities.
Based on the numbers from 2016, it appears that distraction by a cell phone or other sources is not a major cause of fatalities. However, there is a large section of data where it is unknown if the driver was distracted. As you filter through the data and look at other factors, such as speeding or drunk driving, you see less than 20% of fatal accidents are caused by each of these human-created conditions. However, if you were to combine the number of fatalities resulting from distracted driving, driving under the influence, and speed, we may start to see a different picture unfold. The lesson? We could all be more vigilant when we get behind the wheel.
Petridou, E., & Moustaki, M. (2000). Human Factors in the Causation of Road Traffic Crashes. European Journal of Epidemiology,16(9), 819-826. Retrieved from http://www.jstor.org/stable/3581952
Levitt, S., & Porter, J. (2001). How Dangerous Are Drinking Drivers? Journal of Political Economy, 109(6), 1198-1237. doi:10.1086/323281
Evans, L. (2002). Traffic Crashes: Measures to make traffic safer are most effective when they weigh the relative importance of factors such as automotive engineering and driver Behavior. American Scientist, 90(3), 244-253. Retrieved from http://www.jstor.org/stable/27857660
Abouk, R., & Adams, S. (2013). Texting Bans and Fatal Accidents on Roadways: Do They Work? Or Do Drivers Just React to Announcements of Bans? American Economic Journal: Applied Economics, 5(2), 179-199. Retrieved from http://www.jstor.org/stable/43189434.
ANDERSON, M., & AUFFHAMMER, M. (2014). Pounds That Kill: The External Costs of Vehicle Weight. The Review of Economic Studies, 81(2 (287)), 535-571. Retrieved from http://www.jstor.org/stable/43551573.
Bhargava, S., & Pathania, V. (2013). Driving under the (Cellular) Influence. American Economic Journal: Economic Policy, 5(3), 92-125. Retrieved from http://www.jstor.org/stable/43189342.