As an international student, I am most concerned about how much chance I will be permitted to work and stay in the United States after graduating from school. I wanted to know how many people were competing for H-1B visa each year? What’s the average success rate? Also, I was curious about the background of those people who had been selected and got their H-1B visa approval. Therefore, I used several datasets trying to explore the trend and the fact of the H1-B visa petition for the past few years.
METHODS & PROCESS
Pre-Design UX Research
Before starting data collection, I conducted an user research first in order to understand the intended audience’s questions and needs. I had three international students who also want to stay in the U.S. after graduation but not quite familiar with the H1-B visa petition as participants. I asked about what kind of information they would like to know about H1-B visa petition. In addition to the details of the application process, they were all curious about the result of recent years’ visa lottery. They would like to know their chances to stay by looking at the factors of educational level, industry, company, and location from the past records.
Data Collection & Cleaning
From Kaggle, I grabbed the dataset OFLC H-1B Program Data (2011-2018) which contains 8 years worth of H-1B LCA petition data with more than 4 million records and includes information about case status, employer, job type, wage, and year filed. In addition, I found another data report from U.S. Citizenship and Immigration Services that supported more detailed information about the country, age, education, and etc. from 2007 to 2017. Therefore, I used OpenRefine to organize and combine these two datasets into one which contained H-1B petition data from 2011 to 2017 with information about case status, employer, occupation, industry, wage, education, age, country, year filed, and receipt volume.
Data Analysis & Visualization
I used Tableau Public for data analysis and visualization. With the cleaned up dataset uploaded, I began several trying of filtering data elements to determine the best way to encode the values and create the visualizations.
1. The amount of petitions for H-1B visa had been increasing stably since 2011, yet it declined significantly in 2017.
2. Indians and Chineses accounted for most of the H-1B visas.
3. Most of the recipients of H-1B visa were young people aged 25 to 34 with Bachelor’s or Master’s Degree. In 2017, the number of recipients with Master’s degrees surpassed those with Bachelor’s degrees for the first time since 2011.
4. H-1b visas were mostly given to workers in the computer industry.
After I finish the visualization, I combined different UX methods, such as observation, think-aloud, and interviews to get feedback from the same participants of pre-design interviews to refine my work. I showed the participants the first draft of the data visualization above and ask them to think-aloud to record their feelings and thoughts. Also, I observed their movements and reactions along the process. Then I interviewed them about whether they agreed that the goal of the data visualization was achieved and the design style in terms of the color, size, font, legend, and the overall arrangement was appropriate and successful to deliver the message.
- Based on what you have reviewed, can you talk about what kind of people were major groups of H-1B visa recipients?
- What do you think about the color use, font, size, legend, and the overall arrangement of these data visualizations?
- What questions do you still have about H-1B visa petition?
From the user tests, I got the following feedbacks and findings:
- Providing detailed information about the reason or background story to explain the phenomenon observed from data visualization is necessary as the audience might be unfamiliar with the topic.
- It looked like certain groups of people I mentioned in the findings had more chances than others to win the H-1B visa lottery, yet in fact, the numbers of certain types of applicants were already larger, therefore it’s likely to have more winner in those groups.
- The vague interpretations might mislead the audience’s understanding of the data.
- Arranging all the data information into one poster with hierarchy would better deliver the data analysis as a complete story.
- The color use is consistent yet it’s somehow confusing when the same colors represent different meanings in different charts.
- The information would be more clear and interesting to show the number of petition receipts and approval rates.
I refined my works according to the feedbacks. I combined all the data visualization into one poster with a paragraph of descriptive texts on the top to simply introduce H-1B visa and the content of the poster. Also, I added the number of petition receipts and approval rates for better understanding. I chose not to put the guiding interpretation beside each chart not only to prevent misleading of the data information but also invite people to explore the fact by themselves. I didn’t change the color use because I think it would be messy and out of focus if I use different colors for different categories.
It’s always important to step back a little to overview the visualization I made along the process because when I was too familiar with the data that sometimes certain key factors might be omitted without noticing. While the data visualization is provided to the public, in order to deliver the message effectively, we have to design with the goal of making someone with no background knowledge about the topic to understand the data. Therefore, I appreciated the importance of user experience research which could provide useful insights for better design. For example, from the user tests, I realized that not only the color, size, font, arrangement, and etc. are important but how we use the words to describe the content is also critical to affect audience’s understanding toward the message we deliver.
Overall, I was able to answer the questions I raised in the beginning of the project with the available sources. However, in addition to the high-level messaging about overall trends, it would be more interesting to add some individual or person-person-level stories to help make the information more relatable for the audience. I would also like to transform the static visualization into an interactive piece to let people explore the data trend and fact by themselves. I believe this could provide the audience with more engaging experiences with the data and the message would be more memorable to them.
- OpenRefine: A open-source tool for data cleanup.
- Tableau Public: A free software for creating interactive data visualizations.