Visualizing Sexual Harassment Claims in the Workplace 1995 – 2016


Lab Reports

Plagued with multiple high-profile sexual harassment allegations, the past year has seen the rise of movements such as #MeToo and Time’s Up. During a time when the cases capturing the attention of the media pertain to Hollywood or business executives, often times the stories of low-wage workers get lost in the shuffle. The press fails to shine a light on the average workers affected by sexual harassment in the workplace, so much so that 300 women in film and television came together to organize the Time’s Up Legal Defense Fund. In order to continue to shine a light on the issue of sexual assault in the workplace, I created a dashboard utilizing Tableau Public. My goal was to highlight trends in claims regarding sexual harassment in the workforce and show which industries are most affected and the relationship between victims’ wages and claims. I am to educate the general public through simple to understand visualizations that tell the story of sexual harassment in the workforce from 1995 to 2016.

Process

The Dataset

In order to build the dashboard, I first found the dataset through Jeremy Singer-Vine’s weekly newsletter “Data is Plural”. Singer-Vine, Buzzfeed’s Data Editor, sends out weekly emails with interesting data sets to subscribers. The dataset I utilized is had previously been used by Buzzfeed to create the visualizations for an article titled “We Got Government Data on 20 Years of Workplace Sexual Harassment Claims. These Charts Break it Down.”

Buzzfeed received the data from the US Equal Employment Opportunity Commission and the data they shared was mostly clean when I started working with it. The dataset was made up of two different spreadsheets. The first, composed of data on individual sexual harassment claims made from 1995 to 2016, included information on the victim’s race, gender, and workplace sub-industry. The second spreadsheet included information on individual industries, such as overall size and median salary.

Excel

In order to have all the information in one single spreadsheet for analysis, I merged the datasets through their common industry codes. To do so, I used VLOOKUP and joined the tables to specify which claim belonged to which industry as the first table only showed the sub-industry. This then allowed me to compare claim numbers in specific industries and relate these to salaries. One specific obstacle that I ran into while working with the data in Excel was the size. Because the dataset had more than 170,000 entries, Excel kept freezing and crashing. This made calculating simple formulas difficult to execute.

Tableau Public

To create the dashboard and individual visualizations, I utilized Tableau Public—“a free software that can allow anyone to connect to a spreadsheet or file and create interactive data visualizations for the web.” I imported the joined spreadsheets into Tableau Public and using the software, I grouped some of the race variables into individual categories. Specifically, for entries with multiple races, I created a “multiracial” group.

Inspiration

Figure 1: Source – Buzzfeed

Before creating the visualizations, I looked for inspiration on similar topics and representations of variables I wanted to include on the dashboard. I was first inspired by the original Buzzfeed article, which although it included aesthetically appealing visualizations, did not tell a complete story of sexual harassment claims made in the workforce. The article did not mention trends or whether claims in general had been increasing, decreasing or staying relatively the same throughout time. The missing trend information in the article inspired me to create at least one visualization that represented the change in claims throughout time (Figure 2).

Figure 2: Charges by Year

Figure 3: Source – Buzzfeed

The Buzzfeed article’s visualization regarding industry-specific salaries and claims made inspired me to create a similar visualization (Figure 3). Thus, I created a scatter plot that compares the industry’s median hourly salary to the number of claims made in that same industry (Figure 4). Unlike Buzzfeed’s visualization, I choose display number of claims on the y-axis instead of utilizing size to represent differences in reports.

Figure 4: Claims by Industry and Salary

 

 

Figure 5: Source – Erin Gallagher

Additionally, a Medium article by Erin Gallagher on visualizing the #MeToo movement reminded me that including aesthetically pleasing and complex visualizations could be compelling, but that it’s also important to show the basics of the story (Figure 5). Thus, I decided to create three different bar graphs: Sexual Harassment Claims by Industry, Percentage of Claims by Gender, and Percentage of Claims by Race (Figure 6). 

Figure 6: Dashboard Bar Graphs

Another initial source I used for inspiration was the Now You See It book, specifically the heat maps in chapter 7. I thought that creating a heat map representing the change in sexual harassment claims by industry throughout the year could be a good way to show change in more detail. Nonetheless, when I added the heat map (Figure 7) to the dashboard, it made the dashboard look cramped and messy so I decided to leave it out.

Figure 7: Claims by Industry Heat Map

I mainly selected simple visualizations to represent the trends in sexual harassment claims since 1995. I decided not to change the original color, but to leave it blue, as in this dashboard, highlighting differences through color did not seem relevant. For example, I did not want to make each industry a different color due to the overwhelming number of industries and it also seemed like highlighting the racial differences though different colors would be unnecessary. Additionally, I chose to represent the data in straightforward visualizations in order to communicate the most important information to the audience, the general public. For example, visualizations (bar graphs, a line graph and a scatter plot) are widely known and communicate information easily.

Obstacles and Future Work

During the process of creating the dashboard I ran into two main obstacles. The first one, which I mentioned previously, was utilizing Excel with a dataset with more than 170,000 rows. Trying to join two tables together kept causing the software to crash so the next time I work with large datasets, I will try to do it in Pandas, a Python library written for data manipulation. The second obstacle I ran into was that I couldn’t figure out how to change levels of aggregation between time. Specifically, I tried to make the “Filled Charges by Year” line graph interactive to allow users to click on years and be able to observe that year’s individual month trends (as shown in 2016 Year-to-Date Claims). If I were to expand or amend my timeline, I would first try to create the interactive line graph mentioned previously. I think that allowing users to view the data at a deeper level could be meaningful and that displaying the current year’s trend regarding the claims could be educational.

Dashboard Link:

https://public.tableau.com/views/sh_claims_dashboard/SexualHarassmentClaimsDashboard1995-2016?:embed=y&:display_count=yes

Sources

Source of data: https://github.com/BuzzFeedNews/2017-12-eeoc-harassment-charges/

https://www.buzzfeed.com/lamvo/eeoc-sexual-harassment-data?utm_term=.nmn3Vn3R3#.nwyzOMz9z

View at Medium.com