March Motor Vehicle Collisions and Injuries in Brooklyn

Introduction

Empty NYC Coronavirus — REUTERS/Eduardo Munoz

Brooklyn is New York City’s most populous borough. Pedestrians, bicyclists, motorcyclists, and cars constantly compete for time and space to arrive at destinations the most efficiently, however, occasionally at the expense of safety. While NYC is notorious for traffic, the past year has affected how New Yorkers commute and travel within the city, as well as tourism levels. Given these effects of COVID-19, we’ll explore if there is a link between the events of 2020-2021 and Brooklyn’s road and collision data.

Comparative Analysis

The below visualization provides users with hotspots for collisions. This made use of TomTom Traffic Incidents and Traffic Flow data to normalize collisions by # of drivers. While I was not able to normalize the data in the same way, I wanted to give a similar sense of locations that experience a higher frequency of collisions.

Materials

The collision dataset used was pulled from NYC Open Data.

To illustrate the neighborhood divisions, a shapefile was pulled from NYC Planning.

These two datasets were then imported into Carto, a spatial analysis tool.

Methods

To begin, I prioritized filtering the collision dataset as it had over 1 million rows in its original state. NYC Open Data’s site conveniently allows users to apply filters to create a manageable dataset before exporting. As discussed further in the Interpretation section, I chose to isolate the data to March only, as viewing the 2020 collision data showed the sharpest decline in March. Applying filters to narrow down Borough to Brooklyn only, and Date down to March 1-31st of 2021, 2020, 2019, 2018, and 2017 transformed the 1+ million row dataset to 16k.

I intended to use the JSON URL, but found that the filters I applied did not transfer into Carto. Instead, I exported the dataset to a CSV, and then imported it into Carto. With this first dataset, I created a new map.

Imported data points after Geocode analysis.

On the first dataset, I applied the Geocode analysis. By defining the Longitude and Latitude columns, Carto was then able to show each data point in its corresponding location on the map.

With so many data points, I located a shapefile to help define the neighborhoods in Brooklyn more clearly to help users distinguish areas and orient themselves. The shapefile dataset from NYC Planner was exported as JSON and imported into Carto next.

Using the Intersect and Aggregate analysis on the shapefile allowed me to aggregate data by neighborhood, and color by value. This allows users to view which neighborhoods had the highest or lowest frequency of collisions.

To style the shapefile and data points, I applied shades of red to represent values, which felt appropriate as red conveys an existing theme within safety and road rules.

Legend breaking down injury and neighborhood data.

I then created a Legend to help users quickly understand the frequency of injuries in specific areas, and overall collisions by neighborhood.

The final method was adding a widget that would allow users to filter the data by March of each year. By allowing the user to manipulate which year is selected, Carto adjusts which data points are represented while also adjusting the neighborhood colors to reflect that year’s data.

Collision Date widget to allow users to filter by March of each year.

Results and Interpretation

The final result is embedded below (and can also be viewed here).

When initially beginning this project, I tried using 2020 data only (January-December). However, I quickly noticed that the 2020 data had a sharp drop in the middle to end of March, shown in Carto below:

This insight led me to re-evaluate the data I was using. Rather than focus on January-December of 2020, I had decided to filter-in only March data for the past 5 years to see if this drop in March was a consistent yearly pattern. I replaced the 2020 data CSV file with the new March-only data CSV and re-did the same analysis, which led to the final visualization.

The collision date widget in the final visualization emphasizes a clear trend. The separation of data into 5 buckets by year shows a dramatic decline that is unique to 2020 and 2021. While 2017-2019 each averaged around 3k collisions, 2020 had 2.0k and 2021 had 1.5k. Collisions during COVID-19 have been at 50% of the yearly average for March.

This illustrates that COVID-19 may have had an impact on the number of overall collisions in Brooklyn, or at least collision data collecting.

Reflection

While the analysis allows users to see the obvious decline, the relationship is not completely defined.

Why did collisions decrease in March? Were there fewer drivers on the road? Were there fewer pedestrians, contributing to decreased congestion? Did people drive more carefully out of generalized anxiety? Or were people less likely to report incidents due to the world’s preoccupation with a larger issue? With more specific datasets, I might be able to answer this question more confidently. Specifically, data showing average daily traffic counts per intersection would have been ideal to more accurately normalize the values and help identify factors for causation.

Information Visualization

Student work at the School of Information, Pratt Institute