Introduction
The following project’s purpose is to create visualizations representing the Water Quality Complaints in NYC. I chose to research data on water quality because of the ongoing problem that is happening in Flint, Michigan, which has been having high levels of lead contaminating their drinking water supply since early 2014. Originally, the first thought that I had was that it would be interesting to see whether or not the incident in Flint caused an increase of complaints since then in NYC. However, the visualizations that I’ve managed to create have told me a different story all together.
The data set that I’ve used is the “Water Quality Complaints” from NYC Open Data. NYC Open Data collects this information from the Department of Environmental Protection who puts it together from 311 calls made by each of the five boroughs. This public data dates back to 2010 and is updated daily to the present for a total of 7562 complaints.
Methodology
The tools I’ve used in order to do the following visualizations were NYC Open Data, to grab the dataset I needed, Microsoft Excel to erase all unnecessary data and make the data cleaner. I could have used Open Refine, however I noticed the tasks were simple enough to do since I only needed to erase empty columns, and columns I knew I would not use in my visualizations. Finally, I used Tableau Public 9.3 for the visualizations.
First, I downloaded the dataset from NYC Open Data in CSV format. Next, I opened it in excel so that I could evaluate it to see what needs to be cleaned and how much needs to be done. This is when I noticed there were a lot of columns with no values in them. This is probably due to the template that they’ve used to collect the data, or that the column is for rare situations. Never the less, I cleared those columns so they would not give me zero values in Tableau. I also got rid of extra filled columns that were redundant or not needed for my visualizations simply for the sake of streamlining and having a simpler dataset.
Discussion
The first visualization I envisioned would have looked like the one below (see Figure 1).
I thought that having a line graph display the frequency throughout the years would be the most visually organic way to reveal any patterns that might show whether or not complaints have increased after April 2014 when the incident in Flint, Michigan occurred. However, when I created it, it was difficult to see where each month was on the line, so I decided that a bar graph would be able to represent the months better in the span of 6 years. The graph that I created (see Figure 3) is much too convoluted with many colors representing each month, but when I merged the months so that only the years would be represented, as in Figure 2 that I’ve also used for inspiration, it was impossible to see the trends within each year so I kept the longer graph because it was interesting to see what it told me about complaints after the Flint incident began.
According to my visualization, there is a general positive trend from 2010-2016, however, when we look at the trend from 2013-2014 it is negative. This surprised me as I thought there would have been an immediately positive incline of complaints starting in 2014. But, there still is a positive trend from 2014-2016, just much more gradual than I would have expected. I do not believe it is significant enough to say that the incident of Flint, Michigan has influenced NYC to make more complaints about the water quality.
The second visualization that I’ve created is a heat map. Originally, the idea was to create a visualization to cover the values of complaints dealing with the types of complaints made seen in this visualization I found online (see Figure 4).
This visualization broke down the different types of complaints into each borough and shows the frequency of each. This bar graph, however, is similar to the visualization that I’ve created above, in the fact that it has too many different colors and a long length. This is why converting this into a heat map made the visualization a lot simpler and visually impacting. For my visualization (see Figure 5), I used the boroughs and the types of complaints made in each. I used percentages, instead of record counts, for each borough because my focus is on the complaints made for the boroughs. I’ve also gotten rid of a descriptor in the data named “BWSO Referral to Water Quality (For DEP Internal Use Only)” because it did not tell me anything and only appeared once in the data. Finally, I cleaned up the wording of the descriptors because they included unnecessary text such as “(Q5)” probably referring to the part of the survey where type of complaint is.
This heat map taught me that in each of the boroughs, the most complaints made were about the Taste and Odor, showed in the deepest colored blue on the chart. The least complaint made were about clear water with organisms inside, thankfully.
Future Studies
In the future I would love to transpose these visualizations onto actual maps. It would be interesting to see what spacial visualizations would reveal with this dataset. Also, if I could get my hands on older data from before 2010, I think it would be interesting to watch the trends from further back to see whether or not complaints have increased and maybe match it with other datasets of actual samples of NYC. Other aspects of the dataset I’ve used that I did not explore at all are the times the complaints were received, and the time it was resolved. This could be used to make visualizations on the speed of which the city responds to complaints.
Sources
Figures 1, 2 and 4: Nguyen, Denis “How Clean is NYC’s Water?”http://blog.nycdatascience.com/student-works/exploratory-visualization-nycs-water/
Dataset: “Water Quality Complaint” NYC Open Data. https://data.cityofnewyork.us/Environment/Water-Quality-complaints/qfe3-6dkn