Introduction & Inspiration
The purpose of this project is to visually explore how the recent pandemic affected unemployment in the United States during its first year in 2020. The U.S. Bureau of Labor defines unemployment as—people who are jobless, actively seeking work, and available to take a job.(How the Government Measures Unemployment, n.d.). The dataset I found mainly includes statistical data, therefore I choose to focus on the 2020 because of the events that unfolded that year. Any viewer could easily understand fluctuations in unemployment data due to the context surrounding that year. Many people were subjected to temporary or permanent layoffs, and even furloughed. Through the visualizations, I hope for a viewer to make observations and inferences when analyzing the data.
Visualizations that are simple and easy to understand inform my work, especially keeping them clean and not busy is what I aim for. Furthermore, because I was working with a new tool for this project, my main focus with the output was to not only learn the basics, but also to create something that is hopefully visually comprehensible.
Method & Visualization
The data for my visualizations come from a dataset called “Unemployment in America, Per US State” found on Kaggle.com. This dataset consists of relevant population statistics and employment rates per US state since 1976. The primary data source used to create the dataset was the Bureau of Labor Statistics’s official website, as stated by the author. The tools I used for this visualization included OpenRefine, R, and RStudio.
The first step in my process was importing the CSV file into OpenRefine to clean the data. Naively, I believed there was nothing to change in my dataset because it seemed ready to go. However, after beginning the work of manipulating the dataset with my next tools—R & RStudio—I realized I definitely needed to make some modifications. Going back to the square one made the process go slower but ultimately helped me take the next steps. Once I re-imported the CSV to OpenRefine, I made the following changes: modified the column header names so that there were no spaces, because spaces seem to confuse R when using the tidyverse; found and removed the commas in between the data numbers in order to transpose the text to a number (as seen pictured below); and reviewed rows for any repeating facets.
Making those changes to the dataset allowed me to get results with the tidyverse package and move on to using ggplot to create the visualization. After many trials with the syntax, I first produced faceted bar charts for all 50 states + 3 major cities in a combined visualization. My x-axis was months and y-axis was percent of currently unemployed civilians out of the total non-institutionalized civilian population. I thought this was a simple way to get a look at all the states in one glance and one could answer questions such as: What state experienced the highest percentage of labor force unemployed? What state experienced the lowest percentage of labor force unemployed? Which state had the least fluctuations of unemployment percentage throughout the first year of the pandemic? As you can see, there are some states that were affected more than others and that may be due to rural vs urban cities, or communities that kept their working force during the height of the pandemic.
For the next visualization, I chose to narrow my scope and I only selected 6 states to analyze. I randomly picked states located in the 5 regions of the U.S: the Northeast, Southwest, West, Southeast, and Midwest. As you can see below, I kept the same design, but I added data labels to these for a closer look at the percentages of Labor Force Unemployed for each state. Out of the 6 states California, Nevada, and New York experienced the highest unemployment percentages.
For the next visualization, I thought since there are so many possibilities for comparison I would select all the states in one region to compare. Once again, I added data labels to these barcharts and faceted them by state. Through the visualization we can observe how different unemployment levels were in this region. California, Hawaii, Nevada, Oregon and Washington have the highest increase during the month of April, and the rest of the states in the region: Alaska, Montana, Utah, and Wyoming had the lowest and steadiest levels of unemployment percentages.
Ultimately, I believe the visualizations reinforce the relationship between Covid-19 and Unemployment in the United States. Not to mention, there are many inferences we can make as to why some states had less or higher percentages.
Reflection and Future Improvements
The tools used in making this visualization so far were the trickiest for me. R as a language was definitely a jump further than what I know about other languages such as python, I had trouble connecting certain pieces together. There were many trial and errors with the syntax but I think just like other languages practice is what makes it perfect. Ultimately, I think I was restricted in creativity when it came to this visualization, however I hope I was able to create a chart that makes sense. On a positive note, RStudio was pleasant to work with and I enjoyed having everything that I needed on one screen interface.
In the future, in order to improve this visualization I would make other types of charts, but also add in more data points. I also think it would be interesting to see other countries employment data, and do a comparison.
How the Government Measures Unemployment: U.S. Bureau of Labor Statistics. (n.d.). Retrieved March 17, 2023, from https://www.bls.gov/cps/cps_htgm.htm#unemployed
Unemployment in America, Per US State. (2023). Retrieved March 16, 2023, from https://www.kaggle.com/datasets/justin2028/unemployment-in-america-per-us-state
Below is the code used to create the visualizations:
install.packages(c("tidyverse", "rmarkdown", "flexdashboard")) library(tidyverse) library(ggplot2) #dataset in the form of a csv df <- read_csv("updated4-Unemployment-in-America-Per-US-State.csv") #filtered data that I used to create visualizations df_state_setyear_allstates <- filter(df, Year == 2020) df_state_set6 <- filter(df, State %in% c("New York", "Nevada", "California", "Texas", "Iowa", "Alabama") & Year == 2020) df_state_set_west_region <- filter(df, State %in% c("California", "Hawaii", "Oregon", "Utah", "Alaska", "Montana", "Nevada", "Washington", "Wyoming") & Year == 2020) #code for visualizations ggplot(df_state_setyear_allstates, aes(x = Month, y = PercLaborUnemp)) + geom_bar(stat="identity", fill="#ffa271", position="dodge") + scale_fill_hue(c=50) + scale_y_continuous() + facet_wrap(.~State) + labs(x = "Month", y = "% of Labor Force Unemployed", title = "The Effect of Covid-19 on Unemployment in Select States, 2020", caption = "Source of Data from Kaggle.com") + theme(legend.position="none") + theme_bw() + theme(text = element_text(size=12, family="Helvetica")) + theme(axis.text.x=element_text(size=rel(0.8), angle=90)) ggplot(df_state_set6, aes(x = Month, y = PercLaborUnemp)) + geom_bar(stat="identity", fill="#ff717b", position="dodge", size=8.0) + geom_text(aes(label=PercLaborUnemp, hjust=0.5, vjust=1.5)) + scale_fill_hue(c=50) + scale_y_discrete() + facet_wrap(.~State) + labs(x = "Month", y = "% of Labor Force Unemployed", title = "The Effect of Covid-19 on Unemployment in Select States, 2020", caption = "Source of Data from Kaggle.com") + theme(legend.position="none") + theme_bw() + theme(text = element_text(size=12, family="Helvetica")) ggplot(df_state_set_west_region, aes(x = Month, y = PercLaborUnemp)) + geom_bar(stat="identity", fill="#b6efce", position="dodge", size=8.0) + geom_text(aes(label=PercLaborUnemp, hjust=0.5, vjust=1.5)) + scale_fill_hue(c=50) + scale_y_discrete() + facet_wrap(.~State) + labs(x = "Month", y = "% of Labor Force Unemployed", title = "The Effect of Covid-19 on Unemployment in the West Region of U.S., 2020", caption = "Source of Data from Kaggle.com") + theme(legend.position="none") + theme_bw() + theme(text = element_text(size=12, family="Helvetica"))