NYC Dog Walking Violations


Visualization
Introduction
NYC Opendata has a large amount of searchable records about various issues around New York City. Data ranges from noise complaints to street tree inventory in the City. After searching through some of the datasets within the database, I chose the data set entitled “Dog Doo.” This set of data documents violations issued to residents that did not pick up after their dog, I assume during their walk, dating back to 1980. Being a New York City dog owner myself, and having stepped in a few violations, I was curious to know more. I was able to visualize data around:
– In which borough do violations occur the most frequent?
– What time of day do violation get issued?
– How many violations occur by resident zip code?
Materials
Once I chose my data, I decided to narrow down the range of violation years, to just document 2012 – 2017. I also cleaned the data using openrefine. A violation costs $250 but I noticed some sets read $250,000. I clustered these and changed 400 datasets to read as $250. I also deleted columns that I knew would not be beneficial and seemed redundant, such as “violation state name” all produced “New York.” Once my data was cleaned I then imported it into Tableau Public from Google Sheets.
Methods
After deciding on a topic and making sure the data was informative enough, I now needed inspiration. I remembered seeing interactive maps pertaining to dog names in New York City a few months ago, and immediately started my search there.  I found a bubble chart visualizing dog names in New York City. This data was compiled from dog license registrations. When interacting with the cart, the user can hover over a bubble to see the dog name and how many there are registered. The user can also search for a specific name, like I did below for my dog’s name (bubble outlined in red).
I then found this choropleth map visualizing the friendliest dog neighborhoods. Neighborhoods were deemed “friendly” by the number of vets, dog parks, dog sitters and dog friendly businesses. This data was compiled from two sources, NYC Opendata and Street Easy. According to this article about 425,000 dogs live in New York City.
Lastly, I found a stacked bar graph showing the size of dogs in New York City. The data from this graph was the same data used for the choropleth map above. I wanted to include this visualization because it is a fun way to see a stacked bar graph. I like that there are dogs on the side of the percentage illustrating what a small dog would be compared to a large dog. I am curious to know how they determined what a small dog is from a medium dog. Would it be by weight or dog breed?
Results
Line Graph
The data shows violations that occurred between 2012 -2017. There were 1,101 violations total between the five years. At a quick glance of the data I knew I wanted to know where violation occurred the most frequently. I began my visualization by looking at the five boroughs. The line graph below shows the Bronx (brown) to have to the most violations every year by a very high amount. I double checked my data to see if perhaps some datasets were duplicated, and they were not. It is unclear why the Bronx has the most violation. A quick google search shows that it is one of the least populated boroughs with about 1,455,720.  Perhaps other years dating back beyond 2012 would show otherwise. For this line graph, I applied a single value slider to view each borough separately. The data also shows Staten Island (green) has the least amount of violation, this could be due to the fact that more residents have backyard compared to those in other boroughs that do not.
 Choropleth Map
I then wanted to know which zip codes had the most violations. Looking at my Google Sheet I noticed data was missing for the “zip code violation” column. I had data showing the street address where the violation took place, but zip codes for the street address were significantly lacking. I instead used the column “resident zip code” as this column was lacking very little data. Resident zip code is where the violator lives, but not where the incident took place.  As I assume most people walk their dogs within the neighborhood they reside in, and this would be sufficient data for answering my question. When I created a cloropleth map in Tableau, I did notice some areas were shaded in in different states. As all the violations took place in New York City, I can only assume that officers issuing the violation used the violator’s drivers license to gather information. The violator’s divers license may not reflect their current residence in the City. 
Highlight Table
Lastly, I wanted to know what time of day violation were issued the most. My dataset had the exact time in which violations were issued. I chose to look at this data by hours in the day and by borough. The table shows that most violations are issued in the morning between the hours of 7:00 – 10:00am. I thinks it’s fair to say most dogs are out in the morning for their first walk of the day, and this makes sense.
Overall Assessment
My full presentation can be viewed here. Using Tableau became intuitive the more I practiced with the platform, and the tutorial videos were a big help. I did have trouble placing my data into the dashboard. I found this part of the lab to be slightly frustrating because of the dashboard size constraints. However, I am happy with the way it looks. I also wish the brown color I chose looked more like a brown rather than an orange.
I also did not realize when I chose this dataset that the Bronx would have such a high volume of violations compared to the rest of the boroughs. The high volume was a bit difficult to work with because some charts did not look so great in terms of visualization. I really would have loved to use a bubble chart, but because the Bronx has such a high number and the other boroughs had similar numbers by year to one another, the bubbles ended up looking like a pie chart (difficult to see size variation).
Lastly, I realize I do not have an overall chart showing how many violations there were in total by borough. I think that is something I would include for next time. I would have also liked to have brought in other datasets about where the most dogs reside in the city.