Visualizing Traffic Accidents on Staten Island


Visualization

Dashboard: Visualizing Traffic Accidents on Staten Island 
The data that I chose to work with for the Tableau lab was the “NYPD Motor Vehicle Collisions” dataset available through NYC Open Data. To make the data a little more manageable I filtered it to fields labeled “Staten Island” under the “Borough” column, covering a 3-month period from December 2014 to February 2015. As uncovered through the visualization process, there were inconsistencies in this data, such as missing geographic locations (area code 10303 was unaccounted for), and the large volume of empty fields in the original set that were removed through the filtering process.

The dataset provided a lot of interesting opportunities in various forms of geographic, chronological, and statistical data. Of interest to me were some of the basic questions: how many injuries had happened in this time period? where did they happen the most? and when did they happen most often? The data supported these queries in numerous formats, but I wanted to select ones that seemed easiest to communicate visually.

Before being able to create graphics showing relevant relationships, I found myself eclectically plotting different data fields against one another in a sort of see-what-sticks methodology. This was helpful in determining how data is handled in Tableau’s interface, but also in leading to the accidental creation of some interesting visualizations. The process also improved my literacy of both Tableau and the dataset, things that were somewhat difficult to conceptualize prior.

The first sheet (Sheet 1) I created by playing around with the fields “Contributing Factor 1” –the reported causes of accidents, and the statistics from the “Persons Injured” field. This allowed me to determine what the most common causes of accidents were, a statistic I imagined with real-life relevance and significance. Unfortunately the “Unspecified” data within the field represented the largest amount of data by far, and I wound up removing it because it felt at ends with what I sought to illustrate. If this graphic were to be used in a professional or academic context, it would be the creator’s responsibility to indicate that the majority of this data is unspecified, or incomplete, and therefore not authoritative of all the reported accidents.

I initially felt that the standard bar chart format would be effective for representing the data, and chose to represent it (as well as all subsequent graphs) in subdued red hues to invoke the negativity of traffic accidents, but in a way that sought to make the red non-violent/alarming. I had hoped to reverse the order of the bars in my chart, but was unable to figure out how to make the bars and color saturation to descend in the same order (I kept winding up with high-value bars in pale hues). Though in looking at the sheet later, I realize that the scope of the graph is much simpler than what I’ve shown; it is information that would be perfectly suitable as a list, or a basic table as was suggested to me. The color gradations also achieve the same effect, with the more intense colors representing higher frequency causes. I still feel that important information is communicated through the graphic, hence it’s inclusion in my dashboard, but if continuing further with the project I would definitely reduce the graphic to a simple list format.

Sheet 1: Top 10 Causes of Traffic Accidents

image03

The end result of my first visualization, representing  data labeled  “Contributing Factor 1”.

 

In the next instance (Sheet 2) I looked to illustrate the time of day in which the most accidents occurred. This was an exercise in plotting some of the simple statistics surrounding the accident numbers, therefore I felt that the area graph would be useful in representing the information. In a bit of a gut-feeling judgement call, I decided to go with the area graph over a line graph in order to give a bit of weight to the numbers. The idea of people injured throughout the day seemed odd to represent as a fluctuating number similar to stock fluctuations.

I wound up excluding the second sheet from my dashboard because I discovered that the same information could be plotted along with the geographic data in the set. Sheet 5 illustrates the accident frequency by time of day, separated in each column by “neighborhood”, or zip code in the data. This provided an interesting opportunity to see where accidents were happening a lot, and what times of day were better or worse depending on the area. While I think that the circle plot is successful in representing the frequency by time, I worry that the graph is a bit cluttered and the neighborhoods difficult to distinguish from one another. Perhaps small multiples would’ve been better suited for the data, instead of trying to represent it all in one graphic.

Sheet 2: Accident Frequency by Time of Day

image02

Representing the time of day the accidents happened.

 

Sheet 5: Accident Frequency by Time of Day per Neighborhood

image01

Showing when the accidents occurred the most  in each neighborhood.

 

The graphic I was most pleased with was the chronological-choropleth map showing accident frequency per zip code over the 3 month period (Sheet 3). The area map feature of Tableau was able to easily map the longitude, latitude, and zip code values to a geographically accurate map, showing the neighborhood information much more easily than the previously discussed Sheet 5. I felt that this graphic carried the most potential in communicating significant insights from the dataset, though it also unearthed instances of absent data that could’ve been either missing from the dataset, or inconsistently labelled and therefore removed when I filtered the data. This has some unfortunate side effects, such as the false conclusion that can be drawn from the graphic that the Northwest part of Staten Island (area code 10303) is completely safe with no accidents, when in reality the data is missing.

Sheet 3: Traffic Accidents per Neighborhood, December 2014-February 2015

image00

Accident frequency by Staten Island neighborhood, December 2014 (animated in link).

 

Ultimately I felt that the exercise was a useful introduction to Tableau and data visualization in general, definitely highlighting the ease with which conclusions can be falsely represented through the process. It has also illustrated the need to be further aware of the intricacies of the data you’re working with, and making sure it is fully formatted to support your goals. I’m hoping to be able to be able to take what I’ve learned from this lab into future visualizations, guiding me towards graphics that are not just visually engaging, but informative and accurate in a responsible manner.