Introduction
For the Tableau Public Lab, I chose a dataset from NYC Open Data which lists businesses that are certified by the Minority and Women-owned Business Enterprise (M/WBE) Program, the Locally-based Business Enterprise (LBE) Program, and/or the Emerging Business Enterprise (EBE) Program. The three certification programs assist minority and women-owned businesses, as well as business owners who have been socially or economically disadvantaged. Through my exploration of the dataset, I hoped to determine:
- The number of businesses owned by each ethnicity group (Asian, Black, Hispanic, Non-minority) in each of New York City’s five boroughs, and
- How these numbers varied over time.
Informative Visualizations
To inform my designs, I began by looking for charts that show the representation of different ethnicities in the business sector. The first chart I discovered was created by the United States Census Bureau. It shows the number of American Indian and Alaska Native-owned businesses in the U.S. broken down by industry, using data collected for a single year, 2012. For the dataset I chose, industry could easily be replaced by geographic location.
Since the dataset I chose includes the year that each business obtained its certification(s), ranging from 1900 to 2016, I next searched for visualizations that include time-oriented data. The second chart, also created by the United States Census Bureau, shows the percentage of businesses in the United States (as of 2014) that had been open for less than 2 years, 2-10 years, and 11 or more years, broken down by gender, veteran status, and ethnicity.
Inspired by these two visualizations, I began looking for the best way to compare ethnicity by geographic location over time. This led me to the third chart, designed by the team at Information is Beautiful using data from “company & press reports”, which compares the percentage of employees who fall into one of six different ethnicity groups in various well-known tech companies. Users are able to switch between data for 2014 and 2015. This led me to determine that a trellis display would be the most efficient way to display data for more than one year.
Methods
I cleaned up the dataset using Excel, deleting all the information I did not plan to use and leaving only the certification type(s), ethnicity, city, state, and date of establishment for each business. After importing the data into Tableau Public, I filtered the state to NY only. I placed Date of Establishment on the x-axis and Number of Records on the y-axis. I used color to distinguish Ethnicity. Date of Establishment defaulted to Year of Establishment, rather than exact date. I selected only the years 1980-2016 because the original scale (1900-2016) seemed too large, and hindered accurate comparisons. I also eliminated N/A records from Ethnicity and Null records for Year of Establishment.
Using these parameters, I created a line graph to view the shape of the data, as well as a bar graph to better compare the proportion that each ethnicity comprised in each year. Ultimately, I decided to move forward using a bar graph, to match the three informative visualizations I chose.
My first visualizations brought to my attention the significant drop in the Number of Records for 2015 and 2016. This led me to question whether the data for these years was up-to-date, so I did not use them going forward.
For the next visualization, I was ready to try small multiples and make comparisons based on geography. I moved Number of Records to the x-axis with Year of Establishment; added a new variable, City, to the y-axis; and continued to use color to distinguish ethnicity. The City variable contained many misspellings and capitalization errors, so using the Group feature, I consolidated all variations for a single borough (i.e. Queens and QUEENS, Staten Island and Statan Island) into one authority group. I reduced my selection of years to 2010-2014 to limit the number of individual graphs.
At this point, it became evident – through the lack of records displayed for Queens – that I had made the incorrect assumption that “City” was equivalent to “Borough.” To remedy this I again used the Group feature to consolidate cities located in Queens into the previously created authority group: “Queens.”
The resulting visualization allowed comparison of the total number of businesses established each year by persons of each ethnicity in each borough. In order to better understand the representation of each businesses owned by each ethnicity in each borough, I converted Number of Records to Percent of Total Number of Records.
Results & Discussion
Unfortunately, I do not find my final visualization easy to interpret. Each ethnicity experienced fluctuations up and down in the percentage of new establishments in each borough from year to year, but using this visualization it is difficult to detect any trends. The most useful graph I created was probably the line graph that simply displayed the total number of records per year, by ethnicity, for all of NY State. The information displayed in this visualization addresses my second question, how the number of businesses owned by each ethnicity varied over time. It shows an upward trend for all ethnicities, with black-owned businesses clearly beginning to accelerate at a faster rate around the millennium. The main drawback to this visualization, however, is that it does not address my first question, by showing the number of businesses owned by each ethnicity in each borough.
An improved visualization would still incorporate a trellis display, however, each graph would represent a different borough, rather than a different year. For each borough, a time series would display the trends for each ethnicity from 1980-2014. This could be done for both Number of Records and Percent of Total Number of Records. Another alternative would be to represent a different ethnicity with each line graph. Each borough would then be distinguished be color.