For this visualization, I started with noticing letter grades in windows of restaurants. Using NYC open data site, initial
lines of inquiry included questions such as “Which parts of Manhattan had the most restaurants with ‘B’ or lower?”, and “Which
restaurants in Hell’s Kitchen had the best grades?”, and “How many grade ‘A’ restaurants are there in the city?”, and “How
many grade ‘A’ restaurants are there in ZIP 10036?” That last question is the one I focused on and discuss below.
As described above, my initial questions were exploratory in nature. I did not have a specific theory in mind to prove or
disprove.
Working with restaurant data, I was able to pull data for all boroughs, but narrowed it to Manhattan. I decided to focus
on Zip code in which I live (10036), and see what the grades were for the restaurants there. Looking at the ZIP-specific data, I
had a concern about it being too narrow. After filtering the data for 10036, it had more than enough (over 1,800 restaurants), so
the minimum data requirements were exceeded. The associated visualization is here.
Additionally, I looked at restaurant grades and inspection dates for all Manhattan ZIP codes, primarily to see what the
visualization would look like, as well obtaining an overall picture of grades by ZIP. Though not the main focus of this lab, it is
included as an example of Tableau outputs.
Visualization # 1 (Figure 1) was what I had in initially envisioned mind when looking for data. It looked like it would be
easily read and understood (aside from the use of red/green, for color-blind folks), and is somewhat close to what I used for the
lab. The ultimate end product looked similar to Figure 1.
(Figure 1, below. Source: https://harvarddatasciencerestaurantinspections.files.wordpress.com/2014/12/health-grades-by-borough.png))
Visualization #2 (Figure 2) was also similar to what I had in mind, as it had restaurant data, though I was not focusing on
violations in my visualization. I wound up using the bar chart format though they were oriented horizontally instead of
vertically. I stayed away from including the violations (detailed in the NYC open data site), as there are over 20 different
infractions, and their associated descriptions. Representing that data in a way people could understand seemed challenging, at
first glance. The violation descriptions were very wordy, so presenting them would have altered my design, as well as the focus
of the visualization. A legend would be needed to help readers decode the violations and what they meant.
(Figure 2, below. Source: https://www.linkedin.com/pulse/nyc-restaurant-health-grades-visualizing-results-john-bencina)
Visualization #3 (Figure 3) was more of a hypothetical idea, in the vein of “this could also work in a map format, but I’m
not sure how to execute it”. It has a great deal of data, that may take someone a bit of time to process, as it is rather complex,
not only in colors, but also in how it presents the data. As I was looking to create something simpler, the visualization below did
not quite fit the bill.
(Figure 3, below. Source http://opendatabits.com/nyc-restaurant-inspections-results-open-data/)
To create the visualization, I used Tableau Public 9.3, MS Excel, and Firefox web browser for conducting data searches. The
data set came from the NYC open data site, focusing on restaurants in particular. Looking for information, I had located other
data sets, including one on commuting times and census areas, but it seemed too complex to work with, so it was not used.
Restaurants seemed like a good idea, as most everyone goes out to eat at some point.
Method-wise, I started looking at NYC open data and other sites to get a feel for what was out there. Some looked good,
but did not pan out for various reasons: not enough data, or uncertainty around what exactly I would do with the data. The
commuting time and census areas data had some issues: different years measured, census areas changed over time, which
seemed like it would take a fair amount of reconciliation to get something usable.
Looking at restaurant data, I drafted a few exploratory questions (mentioned above) to frame what to look for once I
found what appeared to be a solid data set. Ultimately, my questions were relatively simple, quantitative ones. I used Excel to
download and filter data to show Manhattan (overall) as well as ZIP 10036, to see what was available.
The Department of Health grading schedule details the violations and associated points, as well as the point thresholds
for ‘A’, ‘B’, and ‘C’ grades. Note: No seemingly reliable information was found for meanings of Grade ‘P’ and Grade ‘Z’
restaurants, so they are not addressed in the visualizations.
Working with Tableau, I tried different variations on the data to get a feel for what the visualizations would look like.
Several of them were similar, with minor revisions, such as filtering out cells with null data. The most detailed was the bar chart
breaking down the grades by zip code, in terms of volume of data presented.
In terms of results, data available as of September 17, 2016, showed 336 restaurants with an ‘A’ grade in 10036. Looking
at data from 2012 to 2015, the number of ‘A’ grade restaurants increased from 9 in 2012 (the first year in which restaurant
grades were used), to 235 in 2013, up to a high point of 451 in 2015.
Grade ‘B’ restaurants in 10036 showed a different pattern. 2012 had no establishments with ‘B’ grades. 2013 had 56,
2014 had 71, dipping slightly in 2015 to 69, and falling off in 2016 to a total of 38 with a ‘B’ grade.
Grade ‘C’ establishments in 10036 peaked in 2013, with 40. Subsequent years showed a decline – 2014, with 29, 2015
had 27, and 2016 had only 5.
In ZIP 10036, the ‘A’ grade establishment decline was rather large – over 100 fewer restaurants from 2015 to 2016. This
could be possibly attributed to establishments closing, or not being inspected in 2016, though further research would be needed
to compare 2015 data to 2016. One thing to consider would be the number of restaurants open in 2015 versus 2016, to
determine if there are any trends in terms of openings or closings. Additionally, 2016 is not yet complete, and the data reflected
is only through September, not a full calendar year.
The number of lower-graded (B and C) restaurants declined after 2014. That decline could possibly be attributed to
restaurants becoming used to the inspection process, or establishments closing for various reasons (redevelopment, rent
increases), that are beyond the scope of this visualization. Without having a baseline of sorts, and different data than what was
used here, making anything other than a semi-educated guess is difficult.
Future research could include looking at what restaurants were open in one year versus another for a specific time
period [September 1, 2015 to August 31, 2016 for example], to get a feel for what is behind the decrease of ‘A’ rated places in
2016, in ZIP 10036. Is it because places have closed? Have they not been re-inspected? Have their grades declined over that
year?
Other areas to investigate include what were the most common violations for a given ZIP, or borough. That could be
conducted with the existing data set. Though whether or not one would really want to know is open for debate.
Resources
Links to Visualizations, Data Sources, and Department of Health Inspection Scoring
Visualization #1 https://public.tableau.com/profile/tony.volpe#!/vizhome/Lab2_64/Sheet1
Restaurant Grades and Inspection Dates (ZIP 10036)
Visualization #2
https://public.tableau.com/profile/tony.volpe – !/vizhome/AllRestaurantsByZip/Sheet3
Grades and Inspection Dates for all restaurants in Manhattan
NYC Restaurant Data
https://data.cityofnewyork.us/Health/DOHMH-New-York-City-Restaurant-Inspection-Results/43nn-pn8j
Department of Health Restaurant Scoring
http://www1.nyc.gov/assets/doh/downloads/pdf/rii/how-we-score-grade.pdf