The United States has gone through plenty of cultural shifts through history, not to mention the last 30 years alone. In a matter of decades, the U.S. has seen a tech boom, a housing recession, and a global pandemic—all factors which had a direct effect on employment. As new advancements and events out of society’s control continue to take shape, industries have to adjust where needed. By exploring this topic, we can visualize which industries have had drastic changes for better or worse, and which ones have rode the wave of society’s evolving priorities. To view the dashboard, follow the embed link on the screenshot below.
The motivation I had for exploring this topic was to see what effects COVID-19 had on employment in the U.S. Though it’s common knowledge that the pandemic tanked employment numbers as a whole, I hadn’t seen any data that explored which industries thrived through the pandemic, if there were any at all. I started by looking into open datasets with specific information about employment. After conducting initial research and looking through datasets from the U.S. Bureau of Labor Statistics, I realized that there were too many layers of messy data to read through and clean. Instead, I decided to pare down my scope and look through employment data from the largest market in the U.S.: New York City. After downloading the Seasonally Adjusted Employment CSV file from the NYC Open Data Portal, I was pleased to find data that was messy, but a bit easier to work with. From there, I started cleaning and preparing my data for visualization.
Methods and Process
In order to get my data in a suitable state for visualization, I uploaded the file to OpenRefine, an open-source data cleaning platform that allows users to transform data types, filter out anomalies, and remove or combine certain values to make working with the data a bit easier and more accurate. Upon exploring the data, there were a few areas that needed adjusting.
The first thing I noticed was that the data spans every month between January 1990-May 2021, which amounts to over 400,000 rows. As mentioned in the data dictionary on the NYC Open Data portal, there are two different values per month—one representing a former methodology for aggregation and one that was created in response to COVID-19. I created a filter based on this category and was able to reduce the data by half. The next step was to transform the columns for month and year, combining them to create a date value, which is necessary to create a cohesive timeline of changes in employment. Once done, I converted the column into a date, removed any unnecessary columns, and exported the dataset to a new CSV.
With a clean dataset, I began exploring possible ways to visualize the data using Tableau Public, a free visualization software that takes various data formats and separates the values into bins by its columns. The bins are then used to define a visualization’s dimensions, allowing users to create a variety of visualizations based on the attributes provided. Because I wanted to track the total change in employment over time, a bar chart representing industries by color seemed like a good choice. I placed my Industries dimension into the rows field and an aggregate attribute value (SUM(Employment)) in my columns field. To separate the data by month, I added a Date dimension to the pages field and selected month and year as the format. The last step was to add a color palette to distinguish the different industries shown on chart. After a few tweaks to the employment field (adjusting the numbers to reflect actual number rather than in thousands) and adjusting the timeframe to filter just 2000-2021, I published my visualization to Tableau Public.
Once my initial draft was sent out for peer review and I was able to review my other colleague’s visualizations, I was inspired to make some changes. The first was to rearrange the data to sort industries from most to least employees. I then also reassigned the colors to make them appear less cluttered.
I was inspired by my peers to also create multiple visualizations with the data, making a dynamic pie chart showing the percentage of the total market each industry makes up over time, as well as a trend graph displaying the total employment numbers in N.Y.C. The final product was a dashboard including all of my visualizations, along with a caption with a brief explanation of how to interact with it.
Results and Reflection
After rearranging my dashboard, I was able to gather insights from my visualization. The results were quite harrowing; some industries, such as Leisure and Hospitality, reached peak employment in early 2020, only to see numbers decreased by more than half in April 2020. Meanwhile, other industries such as Public Administration and Financial sectors remained relatively unscathed, nearing their all-time peak employment numbers in May 2021. When compared to the total trend line for employment numbers through 2021, it’s interesting to see so many industries bouncing back while a majority of others are consistently losing employees in response to COVID-19.
Overall, the process of using OpenRefine and Tableau for this project was both seamless and fun to explore. I have used both softwares before and was still able to learn a few new functions along the way, which makes me confident in the wide variety of uses for both tools. Despite this, however, there were some limitations; because I am using an M1 Macbook Air with 8GB RAM, Tableau struggled at times with performance, since there is no native version for M1 processors (instead, it runs using an emulator). This caused frustration at times, and added time to my workflow. Additionally, because Tableau Public hosts visualizations online, it also has slow load times and struggles with performance, automatically turning off the animations I set in the dashboard. This would probably be fixed by using Tableau Desktop, but it would be nice if Tableau Public were able to support the same performance, as it is free to use. In the future, I think it would be great to use Python to combine multiple datasets to compare things such as total GDP or other demographic statistics over the years for deeper insight.