Migration Networks _ Gephi Lab Report

For this lab, I chose to continue my work with world migration data for this lab project. Through exploring an existing network visualization of sorts (http://peoplemov.in/), I was led to the Bilateral Migration Matrix from the World Bank, which presents in table form country of origin and destination country for migrants across the world, from 1960-2000.

The main question I was hoping to answer with this data was: what are the most popular destinations (either country or general region) for migrants from another given country or region, and how has that changed over the past 50 years?

Inspirations

As mentioned above, People Movin was one of the inspirations for my visualization. Though more sophisticated than I was able to achieve, I liked the interactive nature of the viz, and the fact that it has the capacity to show both where people move to from a given country, but also the reverse, i.e. where people who end up in a certain country originated from. One criticism of this viz, however, are that the use of color is not abundantly clear; I now believe that color is assigned based on a sort of modularity class, grouping countries with similar migration patterns, but this is not made clear on the viz. Although this is not presented in map form, it is very user-friendly in that the countries are listed alphabetically, and the column format makes the lines more readable than they would be on an intersecting map.

This viz shows some aspects of what I also hoped to achieve, in that it shows migration over time. However, the circular shape does not inherently make sense or lend anything enlightening to the topic. I took inspiration from the use of color here, which groups countries by region (and uses hue to delineate subregion) in a logical way.

This interactive map has many appealing features. Most notably, it is extremely legible in that users can select just one country at a time to view migration flow in one direction (arrival or departure). Of course the downside to this is that it makes it difficult to see any overarching patterns.

The Data

The data set also contained the gender of migrants (indicating male, female, and total), and I supplemented the data with a few other qualitative dimensions such income level of country (for both origin and destination) as well as region (for both origin and destination). While I originally intended to integrate all of these qualitative descriptors, it ended up complicating my viz, and I therefore decided to focus solely on the quantitative dimension (# of migrants) and region, rather than adding in gender and income level.

The original data set was bucketed into decades. In order to examine the change in migration patterns over time, I decided to split the data set into six separate files, one for each decade, as well as one for the total cumulative migration.

Process

Creating the visualizations presented a number of challenges, many of which resulted from glitches within or limitations of Gephi itself.

First, the use of geographical data all but necessitated presenting it in map form, two plugins were needed: Map of Countries, which imports nodes and edges to create a background map, and Geolayout, which maps points by latitude and longitude. Both of these plugins were finicky and had to be deployed, so to speak, in a very specific order so as not to throw off the statistics of the migration data set. These plugins also did not communicate terribly well with each other, meaning that when geo-located centroids for each country were laid out, they did not map accurately to the background map. It took some playing around with projection choices as well as manual manipulation of the most obviously erroneous points.

Since my ultimate objective was to create six individual networks, in order to show the progression of migration patterns over time, it was important to normalize the colors across visualizations. Since Gephi resets color every time a ranking or partition is changed, this was frustrating. I tried to use the Colors plugin to assign color values to nodes within my nodes table, but Gephi was not properly reading the values. I ended up normalizing the colors manually.

After spending a few hours on just two networks and figuring out the best workflow, I was able to quickly create the rest of the visualizations by following the same steps in order.

First, I imported my nodes table, which was the same across networks. This file had countries and their characteristics, most importantly region and color (although as mentioned above, the color field did not work). Then, I imported an edge table for the given year I was working on, first filtering the spreadsheet in Excel so as to include only occurrences with 1,000 migrants or more, in an effort to reduce clutter in the final visualization and reduce the memory usage (and therefore reduce the chance of Gephi crashing). Then, I ran statistics on the data before running any map layout plugins–this was important since the edges are directed, whereas the edges that comprise the background map were undirected and therefore threw off the statistics if not run separately. Based on the statistics for the nodes and edges from my dataset alone, I assigned color based on region and size ranking based on In-Degree, making the countries with the highest influx of migrants (highest immigration) the largest.

Once these attributes were set, I was able to run the geo-layout plugin to align the nodes with their longitude and latitude, and then the Map of Countries plugin to provide the background map. Since the node size already indicated the level of immigration, I chose to color the edges by source country so that users could determine where the majority of immigrants were coming from for a given destination country.

Results and Discussion

The resulting maps are interesting to look at for a fairly broad perspective on migration patterns. Without an interactive feature at the moment (the Sigma.js plugin was also giving me a fair amount of trouble), it’s difficult to see fine-grain detail from country-to-country, especially given that I have opted to leave off labels, which overcrowded the viz.

However, since the nodes and edges are colored by region, it does give an idea of sweeping patterns. For example, in the network below, which shows the cumulative data from 1960-2000, the dominant colors indicate a strong emigration from those regions, i.e. Africa, Latin America, and Asia. It is also interesting to notice that within regions, especially within Africa, there is an inter-regional network effect. This is also somewhat true of Latin America and Asia, but certainly not of North America. This can be partially attributed to the fact that North America is only comprised of two countries, whereas other continents its size contain multitudes of smaller countries. However, it is also interesting to think of these patterns in terms of economic and political advantage, and the ways in which these factors impact migration.

This network effect within the African continent can be seen even more clearly in the network graph from the 1960s, below. This graph also shows a much stronger emigration from Europe, which could be attributed to lasting effects from the second World War.

Analyzing each network more carefully could be enriched by comparing the data to historical events that might contribute to or explain some of the migration flows.

Information Visualization

Student work at the School of Information, Pratt Institute

Migration Networks _ Gephi Lab Report