I love food. For my first visualization, I wanted to concentrate on New York City restaurants. I initially wanted to concentrate on the top three restaurant categories per borough. However, during the course of the activity, I realized the visualization tool I was using made this seemingly straightforward visualization more difficult than I had previously imagined. Therefore, my original goal shifted. Recently, I heard a podcast on how the ongoing economic crisis in Venezuela had resulted in widespread immigration to the United States and how as a consequence, more Venezuelan restaurants were spreading and their food was becoming more popular. I decided to use restaurant and population data to examine whether there is a trend between the Hispanic population in a borough and the amount of Hispanic restaurants.
To start I needed to find NYC restaurant data. I went to NYC Open Data and searched for “restaurants.” I found a data set called “DOHMH New York City Restaurant Inspection Results” and downloaded it. Since the purpose of the NYC data set was to showcase inspection results, including all the specific violations per restaurant, I needed to clean this set so it would only contain the information I needed.
I opened the sheet in Excel and deleted all the columns that were not necessary for my visualization. These included “Inspection Type,” “Record Date,” “Score,” and “Violation Description,” among others. However, I left a column named “CAMIS” which contained each restaurant’s ID. The reason I did this was because each violation required its own row, which meant that one restaurant could have multiple entries/rows. I needed to clean this information since I did not want one restaurant to be registered multiple times and alter the results. I decided to use Google’s OpenRefine to tackle this issue.
I uploaded the spreadsheet into OpenRefine and grouped the rows by CAMIS. Then I used the “Blank Down” option to blank repeated CAMIS values. While doing this action, I realized that fast foods like McDonald’s and Taco Bell had been registered under multiple restaurant types (“American,” “Burgers,” “Chicken,” etc…). I used the Facet tool to group fast food restaurants under “Burgers,” “Chicken” for KFC and Popeyes, “Pizza” for pizza-related fast foods, and Tex-Mex for Taco Bell. Since “Fast Food” was not an option under the inspection results, I decided to respect their categories. I also did not permit myself from enacting dramatic amends, such as switching Rico Pollo to “Latin” from “Chicken” or switching Latin American restaurants away from the “Spanish” label. In New York City, people (including Latinos/as) conflate the nationality “Spanish” with people who are Spanish-speaking. Even though this is incorrect, it is very popular in New York and I decided not to enact subjective amends, since I had personally not been to many of these restaurants and I might introduce mistake. Due to this issue, restaurant categories are not 100% accurate.
Once I had blanked down the rows and amended some of the most egregious restaurants types, I downloaded the sheet and opened it in Excel. I then refined the CAMIS column by omitting all the blank rows. Once I omitted all the blank rows, I was left with only one entry per restaurant. The new spreadsheet contained the following columns: “CAMIS,” “Doing Business As” (name of restaurant), “Borough,” and “Cuisine Description.”
Finding NYC’s Hispanic population per borough was a lot easier. I used NYC Open Data to find Hispanic Population By Selected Subgroups By Borough. The data set focused on Hispanic population by sub-group, however, I wanted total numbers. I opened the set in Excel, created a new column named “Total,” and used the Sum tool to add each population per borough. I was now ready to upload the data into Tableau.
The visualizations I had collected as a model for the exercise centered around my original plan of visualizing the top three restaurant types per borough. Two of the graphs were from the Pew Research Center. Even though they did not address the topic of food or restaurants, it did use bar graphs to compare patterns between different populations.
Figure 1 employs a bar graph to compare the percentage of women and men living in multi-generational households.
Fig. 1: In Most Age Groups, Women More Likely Than Men to Live With Multiple Generations of Family
Meanwhile, Figure 2 is a stacked bar graph that compare different categories by gender.
Fig 2. Division of Labor in Households with Two Full-Time Working Parents
Both of the Pew Center visualizations employed bar graphs in pastel colors to display information related to populations. I liked how neatly the data was presented. I decided that I did not want to re-invent the wheel for my first visualization and I would follow their straight-forward example.
For food-related visualizations, I found a nifty chart the New York Times created for an article written by Kevin Quealy and Margot Sanger-Kats titled “Is Sushi ‘Healthy’? What About Granola? Where Americans and Nutritionists Disagree”. I liked how it used a straight-forward chart to compare nutritionists versus regular Americans’ opinions. I figured that I could do something similar by combining the Hispanic population per borough database with the Hispanic restaurants per borough database. Unfortunately, this idea proved difficult to realize in Tableau and I had to discard it.
Fig 3. Foods considered healthier by experts than by the public
I first uploaded my clean”DOHMH New York City Restaurant Inspection Results” sheet which now only included the “CAMIS,” “Doing Business As” (name of restaurant), “Borough,” and “Cuisine Description” as previously mentioned. I knew that I wanted to do a bar graph based on the Pew Center visualizations. After selecting “Borough” for the X-axis and number of “Cuisine Description” for the Y-axis, I filtered the “Cuisine Description” results so each bar would only display the Hispanic restaurants from the database. I decided to do a variation of the stacked bar graph, so users could see the nationality of the Hispanic restaurants per borough. I used pastel colors and organized the results in descending order to make information easier to digest; I applied this last step to all of the graphs.
Fig 4. Hispanic Restaurants per Borough
The population bar graph was more straightforward. I uploaded the clean “Hispanic Population By Selected Subgroups By Borough” data set and I selected “Borough” for the X-axis and “Total” for the Y-axis.
Fig 5. Hispanic Population per Borough
Finally, I decided to also include a graph that displayed the total amount of restaurants per borough so viewers could compare the difference in restaurants between boroughs with more Hispanic residents. I selected “Borough” for the X-axis and “Cuisine Description ‘Sum'” for the Y-axis. I used the same colors to represent the boroughs from the “Hispanic Population per Borough” graph.
Fig 6. Restaurants per Borough
Results
Queens had the most Hispanic restaurants out of every borough and tied Manhattan with the most variety. The majority of the restaurants were “Latin American,” followed by “Spanish,” “Mexican,” “Peruvian,” “Chinese/Cuban,” and “Chilean.” Manhattan came in second with a large amount of “Mexican” restaurants. Brooklyn came in third with “Mexican” restaurants also dominating. Bronx came in fourth with mostly “Latin American” and “Spanish” restaurants. Staten Island came in a dramatic last place and mostly had “Mexican” restaurants.
Meanwhile, the Bronx is the borough with the biggest Hispanic population. It is followed by Queens, Brooklyn, and Manhattan. Staten Island is a distant fifth.
When solely looking at restaurants, Manhattan has more restaurants, almost double, than the other boroughs. It is followed by Brooklyn and Queens. The Bronx is a distant fourth and Staten Island is an even more distant fifth place.
In this context, the amount of Hispanic residents does seem to play a role in the amount of Hispanic restaurants in a borough. Manhattan has an almost 4K lead on Queens in total restaurants, yet Queens surpasses this borough when it comes to Hispanic restaurants. There seems to be a correlation between the amount of Hispanic restaurants and the amount of Hispanic residents in Queens.
However, the Bronx has the most Hispanic population out of all the boroughs and its place on the graphs did not change in either of the restaurants graphs. Nevertheless, when compared to Manhattan in Figure 6 it is at a 6K loss and when compared to Manhattan in Figure 4 it trails it by only about 200 restaurants.
Manhattan and Brooklyn came in second and third, but both have “Mexican” restaurants as the bulk of their restaurant categories. I would like to closer examine these results, since I did not remove establishments like Qdoba, Chipotle or other chain/kitsch restaurants that are managed by American companies. The Bronx mostly had “Latin American” and “Spanish” restaurants that might be run by the Puerto Rican, Dominican, and other Latin American residents of the borough.
Even though these results are preliminary, they seem to prove that there is a trend between the Hispanic population in a borough and the amount of Hispanic restaurants.
For future visualizations, I would like to compare the different Hispanic ethnicity in the boroughs versus the type of restaurants. I would also like to further clean the data sheet to remove “Mexican” fast food chains and include Latin American establishments under the “Chicken” category, like Rico Pollo, to the set. Income levels in Hispanic residents between boroughs would be another interesting factor to add to this analysis, since Bronx is known for its food desserts. As the anti-Latin American rhetoric in some parts of the US grows, these type of visualizations could be useful in proving the entrepreneurship and hard work of these individuals.