Water Quality Complaints GIS


Visualization

Introduction
Back to water quality complaints in NYC. For my first lab, using Tableu Public, I has used water quality complaint data from NYC Open Data to analyze different aspects of the data such as the borough with the most complaint, most frequent complaint type, and how many complaints were made over the last 6 years. I would like to revisit this same dataset for this lab, this time using the geospacial data to visually display the complaints onto a map to provide another kind of visualization using Carto DB.

The following are a few inspirations that I have used before doing my analysis.

 

Figure 1.0: Used as Inspiration

In Figure 1.0 you can see a type of complaint data in Astoria that uses clusters to show how many of what type of complaints were made from a list of heat, electricity, and unsanitary complaints. Clusters are useful for giving general overviews of the sum of information within a certain areas. Cons of this is that the clusters can sometimes get large and obstruct the actual map.

 

Figure 1.1: Used as inspiration

Figure 1.1 is another type of complaint data of NYC. This map also seems to be clustering the sum of the complaints, without showing the actual numbers to give a quick visual of which complaints are most made in certain areas. One of the problems with this is that the complaints are overlapping each other, which makes it difficult to see actual sizes of the cluster. Zooming down to an area may reduce this effect.

Figure 1.2: Used as inspiration

Lastly, Figure 1.2, shows a completely different visualization. This map uses points and each one is categorized by using images to decipher each one. I liked this idea and thought maybe I could do the same for each complaint in my map, however my map has too many points and even using color for each complaint type is a bit overwhelming.
Methodology
The tools I’ve used for this visualization are two databases: Bytes of the Big Apple for a shapefile which was used for the borders of the boroughs and census blocks, and NYC Open Data for my point data showing each water quality complaint. Finally, Carto DB was used to map all of the data.
I’ve created several maps, each giving a different representation of the same data. The first map simply uses torque to plot out each point over time from January 2010 to November 2016. Also, the each borough was also changed to a select color to make them distinguishable, using information from the borough name column. The next map adds categories to the points, showing each type of complaint via colors and is stationary. I’ve allowed this map to be able to give more information by clicking on the points. Next one uses intensity to show where has the most or least complaints saturated via a scale of yellow to red. This map is stationary, and does not use torque. Finally, similar to intensity, I used torque to create a heat map that would show the “intensity” over time.

Discussion
The first map I created (Figure 2.0) does not give much information. It is pretty basic, and can only give a general idea of where complaints are made, but does not say what the complaints are. Torque is also limited because it does not allow users to pause and click points for further information. I attempted to bridge this gap by also creating a map that was stationary in order to display more information with clickable point. As you can see (Figure 2.1) such a map has too many points and too many complaint types to be easily comprehended. The colored boroughs also adds to its overwhelming nature. Simply making the boroughs black, or some other single color would help the points to show.

An intensity map (Figure 2.2) can be used to show where complaints are made the most. However, because I’ve done previous analysis on what boroughs made the most complaints, I can tell that this map is actually misleading. Here, Manhattan appears to be the most intense. But according to my previous analysis, Queens has the most due to population. Manhattan only appears more intense because it’s a smaller space, so Carto DB’s software makes it appear more intense due to the overlapping, and Queens less because of the wide area it covers. This is quite interesting to consider when viewing visualized data. This is the same for the heat map that I’ve created. Manhattan appears the hottest over time (Figure 2.3).
Future Directions
I believe I’m looking at too much data at once. Perhaps in the future I could focus my efforts on a specific borough, at a specific time period to reduce the clutter and have a main focus to analyze. When it comes to larger datasets, we can easily overwhelm users with too much colors and points, which renders the information near useless. A straight forward visualization with a clear point is much more effective.
Sources
Figure 1.0: http://studentwork.prattsi.org/infoshow/wp-content/uploads/sites/2/2015/04/Cluster-Map1.png
Figure 1.1: http://studentwork.prattsi.org/infoshow/wp-content/uploads/sites/2/2015/10/Tableau_Hwang_Map.gif
Figure 1.2: http://www.berkeleyside.com/wp-content/uploads/2016/07/Screen-Shot-2016-07-05-at-8.55.54-PM-720×588.jpg
Water Quality Complaints: https://data.cityofnewyork.us/Environment/Water-Quality-complaints/qfe3-6dkn/data
Shapefile: http://www1.nyc.gov/site/planning/data-maps/open-data.page