Cyclist Injuries in NYC Jan 2017- Aug 2020


Visualization

Introduction

New York and its residents have experienced a large increase in cyclists in the last few years. Programs like citi bikes have made cycling a very popular method of transportation in Nyc. Many articles talk about the growth in popularity bicycling gained over the years. There is a steady increase every year and in the warmer weathers it’s apparent that there is an increase in cyclists. The reported number of accidents in cyclists has gone up every year despite the city’s action taken to improve cyclist roads. Every summer there is an increase in accidents and this summer has the highest number of accidents reported. This year we also saw the effect covid 19 has had on the number of accidents during the beginning of the pandemic.

This report is based on datasets from https://opendata.cityofnewyork.us/. The dataset is called the Motor Vehicle Collisions – Crashes. This dataset has been getting updated daily since May 2014 and in-total contains about 1.7M sets. For the purpose of this topic will only produce information on accidents that happen to cyclists between January 2017 to August 2020. The report shows the increase of accidents every year, the hours that most accidents occur and will also highlight some of the causes of accidents that happen to cyclists between these dates.

Methods / Materials

Excel : Used to filter unwanted data
OpenRefine : Tool for working with messy data
Tableau : Data visualization software

The following fields of the dataset were used for the visualization:
Crash Date and Crash Time
Number of Cyclist Injured
Contributing Factor Vehicle

After downloading the data sources as CSV files, I filtered out years that had large gaps in information. A large portion of the data on accidents did not have Geo-location which prevented me from creating an accurate graph visualization on accidents per location. Instead, for this lab I focused on the total number of accidents per month and hour. I cleaned the dataset in Openrefine, removing extraneous information and typos from the user inputted data. To create the visuals I used Tableau public which allows exploring and visualizing the data without coding skills.

Inspiration

Results

In this bar chart I choose to label the month of April and August because the April 2020’s low number was caused by the pandemic and August 2020 was labeled because this was a record high in the number of accidents in one month. The color scheme used for this chart helps illustrate how the season affects the number of accidents that occur during these times.

Heatmaps are tables, with each cells colored according to values on some range.

In this table, a Heatmap was a perfect choice to highlight the number of accidents that occurred throughout the hours of the day.  By looking down and across, we can identify certain patterns, for example we can spot very few accidents in the winter time during late hours and high occurrence of accidents during the summer time around rush hour.

UX Research

I asked 3 other New Yorkers to interact with the visualization and speak to me on what they liked, disliked, and what they had questions in regards to the data visualization. The participants were asked in which month they are likely to get in an accident and at what hours. The participants were also asked about initial reactions to the data. All 3 participants understood the question and were able to identify moments in which an accident is most likely to occur. 2 participants mentioned confusion on the bar graph representing the contributing factor of accidents. They also stated that the color scheme did not match the rest of the color scheme with the other charts.

After initial reactions from my participants I made several changes to the final layout of my dashboard. The initial design contained redundant dates that were occupying space. Abbreviating the months and reducing the size of certain numbers helped reduce the visual space on the dashboard.

Participants reported to like the visuals on the left but was confused by the bar chart on the right containing information of cause of accident

Recommendations

To continue this study, it would be interesting to investigate which locations are accidents happening the most. To achieve an accurate visualization on accidents per location, the dataset must contain informations on the accidents Longitude and Latitude. While the dataset did contain a majority of this information, there was a significant portion of the dataset that required formatting and converting. During my investigation I was able to determine that out of all the boroughs Brooklyn had the largest portion of the accidents. More interestingly Astoria, Queens had the most accident per zip code.

Another area to look into would be accident on electric bicycle. While my research did not investigate the specific number of accident on E-bikes, this could serve as an interesting topic to investigate. E-bikes have become popular amongst workers who provide delivery services and commuters. Regardless of the rise in popularity in E-bikes, overall there is an increase of cyclist in the city and also an increase of accidents. Lets just hope the city is acknowledging this steady increase and takes action in creating a safer environment for our cyclist.

References

Link to interactive interactive visualization :
https://public.tableau.com/profile/jeffrey.delacruz#!/vizhome/Lab2_16019930331360/Dashboard1

Link to raw dataset on accidents in New York City :
https://data.cityofnewyork.us/Public-Safety/Motor-Vehicle-Collisions-Crashes/h9gi-nx95