Introduction
Water quality has been a very hot topic lately here in the United States. From the on going incident in Flint, Michigan with led infested drinking water, to the protests at Standing Stone that included possible devastation of the Reservation’s water source, I thought it would be interesting to take a look at the data from where I live, New York City. In my previous work, I’ve investigated the data to see trends and patterns in the amount of water complaints since the beginning of Flint’s water crisis in 2014 to see whether or not people have made an increase in complaints here in NYC. I thought that maybe there would be an increase in awareness or even paranoia since then. The results of this research proved that the data does not currently show any increase since then, however further exploration led to finding the rates of complaints in each Borough and showed that Taste and Odor were the most frequently complained thing in all 5 Boroughs.
With the help of user testing, I’ve managed to take these findings and updated them to the most recent data and made slight adjustments. However, I’ve also furthered my research by looking at a completely new aspect of the same data. The main focus of this project is to create visualizations that would give insight on the length of time the city takes to resolve complaints in each borough and also see which type of complaint takes the longest to be resolved.
Methodology
The data source is NYC’s Water Quality Complaints, which was obtained through NYC Open Data. This data is updated daily by the Department of Environmental Protection and is received by 311 complaints via phone or the internet. This data was downloaded at the end of November 2016, and only contains information up to that point.
The four visualizations used in this report were created using Tableau Public. The highlight table (Table 2.0) is a remake from a previous project and was designed with boroughs in columns, and the descriptors (what I have been calling complaint types) in the rows. There were a lot more complaint types, but in order to focus on what is relevant, complaint types that were null or irrelevant such as “BWSO Referral Needed” which only appeared once, were excluded and all complaint types that had to do with “Taste/Odor” were grouped together. Next, an analysis was done to show the percentages of each column.
With this visualization, two users were asked to perform a basic usability test consisting of a few think a loud tasks: “Which Complaint was made the most in each borough?”, “Which complaint was made the least?”, “If there is anything you could do to improve or add to this what would it be?”. I received a lot of positive feedback with not much change needed, except one person stated, “It is obvious which one is the most complained about however because there are so many percentages on the table it would be interesting to see how that looks compared to one another, maybe by bars.” I took this into consideration and created a tree map to accompany the highlight graph.
For the next visualizations (Figures 2.2 & 2.3) a calculated field was created using the difference from the closed date and created date of each complaint. This was done to figure out the duration of time each complaint took to be resolved. This new calculated field was used in two separate visualizations. The first one being a spread of all the boroughs and dots used to show each complaint. The dots were sized according to the sum of each duration, and colored by the same. Quicker resolutions are small and light blue or light pink while longer durations are larger and darker red.
Two more users were asked to perform a basic usability test on this visualization with the following think a loud tasks: “Find the longest time it took the city to resolve a complaint.”, “When was that complaint made and how long did it take to be closed?”, “What would make this visualization better?”. Responses during the think aloud portion of this test consisted mostly of confusion because there is a lot of data to work with, and also not knowing where the years began or ended. Both users also answered the last question with advice on seeing what those complaints were. Taking all these into consideration, grid lines were incorporated in order to see where years began, more details were added to each point to show what the complaint type and zipcodes were when hovered over, and a new visualization emerged that focused on the complaint types and how long each one took.
This new visualization used the same calculated field but instead compared each complaint and how long they took. For this one, I chose to used the ungrouped complaints which will be explained in further detail later. Each borough were colored a separate color to show they were separate groups.
Discussion
First, I’d like to talk about the recreation of the highlight table (Figure 2.0). Not much change was made to this from my previous report besides updating the data and changing the color to a more aqua blue for aesthetic reasons. However, I created a dashboard with the additional treemap (Figure 2.1). The treemap is in the same colors as the highlight table because they have the same values, but with the treemap I’m able to represent the amount of complaints through size. Catherine Plaisant in “Information Visualization and the Challenge of Universal Usability” expresses the importance of usability testing in order to select the best representation of the data. From user feedback it seemed that having such a visualization could aid in telling the story more quickly. Taste/Odor visually takes up half of the entire space that all of the other complaints take up together. This could be because there were more than one Taste/Odor category, so grouping them together has made them significantly larger, however I believe that it is necessary to group them because they are very similar. It just shows that Taste/Odor is the biggest issue here in NYC. Interestingly enough, this complaint also happens to be the second or first longest duration (Figure 2.3) in resolving depending on how it is visualized. We will get back to that later.
Duration is the main focus of this project (Figure 2.2). I wanted to see how long the city takes to respond. This was challenging, as large data tends to be (Kasik, 2009). There are over 7000 points of data to work with in this particular study and showing the duration of each is impossible to do it the way I originally planned through barbell graphs. Instead, The durations were laid out for each and every point by month and year according to each borough. Initially, each point was demonstrating duration by bars, but because there are over 7000, some were overlapping and making it difficult to see what was really going on. Turning them into circles, and sizing them were more effective as the clearly larger circles took up less width than the bars did but also allows human eyes to recognize it quickly. Because some of the circles may be confusing in size at a glance, color was added in order to assist quick recognition of difference. I used a 6 step diverging color from blue (shortest) to red (longest). I chose these two colors because blue is softer, which intuitively should help a person to know it is shortest, and red is stronger, which can be attributed to problematic details. Each point also moves across the table downward as it progresses through the year, and to the right so that it would be easier to see the progression. All of this helps towards storytelling in as naturally and intuitively as I could. Segel (2010) as well as Few (2009) talks about the importance of storytelling in your visualizations and using colors, shapes, and direction in helping users to better understand what you’re trying to show. With that being said, the amount of data I’m trying to show limits how much I can tell without reducing too much information.
To expand on this visualization, it was best to create another one to accompany it (Figure 2.3). User testers who thought it would be interesting to see how long each complaint took inspired this one. I simply used bars because there wasn’t too much to overwhelm users unlike in the previous one, and separated each borough by color as to show they were not connected. Initially I kept the grouped complaints, which grouped all the different Taste/Odor complaints together and making it obviously the longest, however I felt it could be slightly deceiving. When separated, “Clear Water with other Particles” becomes the longest duration in and of it self which we’ll discuss further in the results.
Results & Conclusions
These visualizations show us some interesting conclusions. Generally, it seems the city resolves complaint cases quickly. There are not many cases where it took more than a few months. However, what stands out are the two cases in Staten Island from 2010. The first incident was a complaint made about Taste/Odor of chemical made on January 1, 2010. This case took a full 6 years to resolve on February 17th, 2016. The second incident was another Taste/Odor but about chlorine. This one was made on October 21st, 2010 and was not resolved until February 17th 2016 as well. What’s interesting about these two cases is that they took place in two different zip codes. Two possibilities come to mind because they have the same closed date: it is possible that this could be a case where the closed date was not logged and they found it on this date, or they just so happened to be resolved at that time.
Taste/Odor has taken the most longest time to be resolved as a whole, but when separated, clear water with particles in it is the one that takes the most time to be resolved. I guess since Taste/Odor is the most complained about, it would make sense that it could be considered the longest simply because half or more of the complaints happens to be Taste/Odor.
If this data were looked at again in the future, I would like to make improvements to the Duration of Resolving chart to make it more user friendly. For one, I’d like the colors to be more distinct and color blind friendly, and I’d also want less overlap of the points without using too much space. Maybe it could even be done completely over with a different point of view. Narrowing the study to a shorter time span or even location would be helpful in using different visualizations that are not overwhelming.
Work Cited
Few, Stephen et. al. Effectively communicating numbers: selecting the best means and manner of display. Proclarity, 2009.
Kasik, David J. et. al. Data transformations and representations for computation and visualization. Information Visualization. Vol. 8 No. 4, pp. 275-285
Plaisant, Catherine. Information visualization and the challenge of universal usability. University of Maryland, 2005.
Water Quality Complaints. NYC Open Data. https://data.cityofnewyork.us/Environment/Water-Quality-complaints/qfe3-6dkn/data