A Look at Migration Surrounding the United States


Visualization

After working with data from the World Bank on world migration throughout the semester, I was interested in looking more closely at migration patterns surrounding the United States specifically. I was interested in visualizing the patterns in terms of what countries/regions American citizens emigrate to, and also looking at the income levels of the destinations. Conversely, I wanted to examine the most frequent places of origin of people entering the United States. Given the amount of discussions currently taking place about border control, both in the United States and abroad, I thought it would be interesting to look at migration in both of these directions in a more detailed way.

I intended this visualization to be used by everyday readers, as opposed to specialized users. For instance, this visualization might appear in a political article or blog in order to illustrate broad trends of migration; I did not intend users to need advanced skills or knowledge to use the visualization, nor did I intend extremely complex understanding of migration to be gleaned from viewing it.

Process

To create this project, I used selective data from the World Bank’s Global Bilateral Migration Matrix, first filtering the data down to only show the United States as a country of origin. The data was originally disaggregated by decade, from 1960-2000; I was interested in looking at cumulative numbers, and therefore collapsed the data by creating a Pivot Table to sum the number of migrants across the years, resulting in a single case per country of origin. I then returned to the original data set, filtered for United States in the destination column, and repeated the same creation of a Pivot Table to sum the U.S. immigrants across years.

In order to show the heart of these data, which was country of origin and country of destination, in either direction, I chose to create a network visualization in Gephi. Since I was working with geographical data, I knew I wanted to create a map, and before starting on the visualization, downloaded and installed two plugins—GeoLayout, which maps nodes to latitude/longitude, and Map of Countries, which adds nodes and edges in the background to create a world map. I also created a separate nodes table with each country, the latitude and longitude of its centroid, its region, and income level. This nodes table worked for both the emigration and immigration data sets.

I created the emigration map first; I also built some User Experience research into this step, as I wanted to avoid making similar mistakes on both network maps, and thereby avoid having to recreate both maps several times.

After importing the U.S. emigration data into Gephi, I ran weighted degree statistics on the edge table and sized the nodes based on in-degree. This resulted in the countries with the highest number of American emigrants being largest. I then partitioned the color of the nodes by region, assigned edge color based on target, ran the GeoLayout plugin to place the nodes, and finally ran the Map of Countries plugin to create the background map. As with previous efforts to create a map in Gephi, the background map and nodes did not align perfectly; however, alignment was closest when using the Gall-Peters projection system, and I was able to manually move the most visibly errant nodes to their proper location. These steps resulted in the following map (Fig. 1):

Screen Shot 2016-07-03 at 9.05.22 PM

Fig 1: US Emigration, nodes colored by region

At this stage, I recruited three users to view the map and offer feedback on its attributes. These users were selected to be average in their knowledge of world migration trends (i.e. aware of some discussions and at least vaguely interested in world politics and international relations, but not experts). In order to test the strength of the visualization on its own, I did not include legends, but did insert a title after exporting from Gephi. I asked each user to answer a few questions about the content, namely

  • Can you tell me what this visualization is about?
  • Can you tell me which destinations (either country or regions) are most popular based on this visualization?
  • Is there anything that would make this content clearer to you?

Of the three participants, two correctly inferred that the map showed details on emigration from the United States. Once participant believed it to be showing vacation destinations instead of semi-permanent relocation. All three users correctly selected Canada as the most popular destination, although one indicated that there were “a lot of pink lines,” which made her not sure. Two of the three participants felt confused by the colors—did they indicate countries (and if so, why were two dots the same color?), or something else? When I indicated that color represented region, users indicated that this seemed redundant, as regions are already represented by means of the background map, and that the specific colors used and the thickness of the lines made it difficult to isolate destinations.

All three users suggested adding more context or guiding visuals—two suggested a legend for color, and two suggested a subtitle and/or sidebar. The sizing of nodes seemed clear, although the color was not.

This feedback made me rethink the use of color. Since I was also interested in showing the income levels of each country of origin, and this would create a new layer of information on top of the geographical data, I chose to assign color to this variable, and also create a legend for clarity. This way, seeing many lines of the same color more clearly indicated something—in this case, that many US citizens emigrated to countries of a particular income level. As for the general content of the visualization, I easily understood how the title was ambiguous as to whether the data represented vacation or permanent relocation, so I added a subtitle to indicate the latter.

These tweaks resulted in the following map (Fig. 2), which tells a clearer (and different) story:

Final_Emigration_withLegend

Figure 2: US Emigration, nodes colored by income level, with subtitle and legend

 

After incorporating changes based on the first map showing emigration from the the United States, I used the second data set, representing immigration into the United States, to create a network map in Gephi following nearly the same steps. In this case, however, I was interested in showing the nodes with the highest weighted out-degree, indicating that larger percentages of the total immigrants come from that country. To remain consistent with the first network, I assigned color to the income variable, using the same colors as with emigration. As with reversing the attribute that determined node size, I also assigned edge color to source variable on this map. This resulted in the following map (Fig. 3):

Final_USImmigration

Figure 3: US Immigration network map

 

Findings

Viewing these two network maps side by side presents some similarities and differences. Most notably, the map depicting emigration from the United States shows overwhelmingly that Americans who move abroad move to other high-income countries. All but one of the visibly sized nodes are colored green (high income); the one large upper middle income node, for Mexico, is unsurprising given the high number of Mexican-Americans (as shown in Figure 3 depicting immigration to the United States), assuming that over time, many of these immigrants may repatriate to Mexico.

While the map depicting immigration to the United States also shows a majority of high income countries with the largest nodes, it is more varied in color and therefore economic status of immigrants. The higher density of edges and increased visibility of nodes also suggests that the Unites States has more immigrants than emigrants, which is born out in the data.

Future Directions

These networks could be supplemented by more detailed information—perhaps in bar chart, or heat map form—about region-to-region migration, gender of migrants, or examination of specific countries of destination or origin.