Introduction
It is no secret that tech has had a long-standing diversity and inclusion problem. The homogeneity of the industry is arguably a root cause of many larger issues that plague tech. It has implications for justice and fairness; it also results in devastating flaws in the industry’s own products. Companies know the importance of diversity. In fact, Workforce Management estimates that collectively, companies are spending billions on diversity and inclusion efforts. The aim of the visualization is to provide an overall trend of diversity in the tech field either chronologically or demographically. Employers and employees would gain insights from the visualization, especially when zooming in the scale and looking at the comparison of companies can provide readers prediction on the future diversity within a company.
Even so, the visualization is limited in its ability to show viewers many other factors and more detailed information behind the existing diversity situation, such as the diversity of different positions and different levels in tech, views and thoughts from minority races, etc.
Visualizations
A line chart to separate females and males in two columns for a better comparison in terms of gender. Using color code to differentiate companies and to showcase the diversity trend through 2013 – 2018.
Compared to the line chart, I utilized the same variables and filters to create the heat map. By adding a gradient color to the percentage of females, it can be told clearly from the map about the diversity of gender in chronological order.
This is an overall dashboard of race diversity in big tech companies, I used an area chart to deliver the data visualization for there are more variables involved. To zoom in we can look at more details of the chart:
Data
The major dataset I used is Employee Diversity in tech on “Information is Beautiful”. The dataset contains information on gender and race in most of the tech companies and social media sites in the US for five years from 2014 – 2018. I chose to use the data not only because of its clarity in separating measures such as the inclusion of the race, but also because it prompted me to create and compare different visualizations to have different interpretations and visual experiences of changes in diversity in tech, which matched with my goal exactly.
I spent a lot of time cleaning, sorting, editing, and transposing the original dataset using OpenRefine. The original dataset has female, male as separate columns. At first, when I imported it to Tableau, I’m not able to generate a visualization that indicates the changes in gender proportion throughout the years. It seemed that Tableau couldn’t figure out the female and male solely. To resolve this, I transposed the female and male columns to new columns called “gender” and “percent”, in this way Tableau successfully detected the data and it’s possible to view the percent of each gender as Y-axis and can tweak colors based on the visualization.
Process
From the beginning, I’d like to explore how the diversity changes in tech with years so the variable “Years” is always on the X-axis. When exploring the original dataset, I found it contains metrics that include female, male, white, asian, latino, black, multi, race change percentage, and several other non-numerical elements such as source, link, etc. First of all, I did basic cleaning of the data and saved it with the metrics I’d like to work on, which are female, male, white, asian, latino, black and mutli. Then I directly dragged the CSV file in the tableau by year into a line chart, and I dragged both females and males to Y-axis. However, there was only dots showing on the canvas, Tableau couldn’t figure out females and males separately and it makes no sense if we don’t look at the percentage of the gender. The same thing happened to the race when I tried to drag each race onto the chart. To resolve this, I transposed the data on gender and race to separate columns as below, and to make it an individual variable in tableau, it successfully detected the data and created a visualization as a line chart for gender, a bar chart for race.
As followed, I explored this map with dual-axis. One axis is the gender diversity of the tech company, the other axis is the percentage of the gender over the years. However, right now we are not able to see the diversity of gender within each company. I added the data “company” to “color” and it immediately differentiates companies by different colors.
I applied the same method to race and hope to see a change of diversity in the race over the years in each company. However, the visualization turned out to be a mess, especially when so many colorful lines entangled with each other. It’s failed to meet the goal of not only showing the changes of race diversity.
I changed the line chart to bar chart, and I switched the data “Company” with “Race” to color code race and separated companies into columns for a better comparison. It felt much clear and more straightforward than the previous line chart. Then a question came up in my mind what’s my goal of delivering this visualization to readers? I’d like them to have an intuitive view of changes in race diversity in each company. Hence, I tried the area chart and it showcased a better trend by the change of the colored areas.
Meanwhile, I was trying different charts to showcase gender. If we focus on the detailed gender diversity in each company, to make gender as X-axis and company and percent as Y-axis is effective. But I’d like to view the changes over the year throughout companies, so I put the percent of gender to the color filter, only left “Year” as X-axis and “Company” as Y-axis.
After having a second look at the chart, I found there were several columns with null data or missed data. I grouped the company by name and field and reduced the total amount of the rows.
UX Research
I recruited one of my classmates as the participant for in-person user testing. During the session, users were asked to think out loud. The tasks are designed to evaluate the information structure, visual/appearance, and interactivity of the chart. 5 tasks were given to them to complete, followed by post-task questions:
- Try to find a section on chart that interests you, and describe the information you find.
- What did you see at first sight of exploring the map/chart?
- Why does the information interests you?
- Explore the right side of the chart, and describe the information you find.
- What do you think of the information?
- Does the color stand out to you?
- Suppose you are planning to , try explore the map in the middle.
- Did you find the information you want?
- Does the coloring make sense to you?
- Compare two versions of the visualizations about gender diversity, and describe the information you find.
- Compared to the other visualization, what do you like and dislike about the each one?
- Is there any information you feel is missing in each visualizaton?
- Check the chart on the right side, and describe the information you find.
- Does the color code make sense to you?
- Does the information provided meet with your expectation?
Findings
The participant paid attention to the right side of the chart first, which states out the regional information. In general, the information provided is less than expected, meanwhile the visual is easily understandable, and last but not least, the interactivity of the map is a little bit hard to notice.
Future Steps
There’s still a lot of work to be done to improve diversity both in the tech industry and in tech-based roles and professions in other industries. For example, I’d like to explore the reason why diversity in tech is so important, and what companies can do to improve it. Meanwhile, to explore and discuss other similiar visualizations should be included in my scope and to see how they inform my work.
Inspired by the professor, it would help to think of the various levels to analyze here. For instance, high level historical trends in the industry, or across different types of companies. Or lower-level detail such as the top 5 companies with biggest gains, lowerst 5 companies in terms of gender diversity. In those cases, I can optimize different graphs for each and put them together into a dashboard or larger story.
A map-based visualization could demonstrate the companies geographically and generate a visual comparison to figure out if there’s a difference over west coast and east coast.