INTRODUCTION
Originating from mathematician Leonhard Euler’s graph theory, prior to the era of modern-day computing, network analysis was initially applied to social science research. Since then, we have witnessed a wide-spread adaption of the field, as computer development has allowed for analysis and visualization of large and complex networks, ranging from social media to politics and trade.
In this lab, I attempt to unveil global trade patterns by observing a network of countries and trade relations. By presenting several types of network visualizations, I strive to create a story based on a rather simple list found on Wikipedia, with an underlying mission to demonstrate the vast capabilities of the modern network visualization tool Gephi.
Inspiration
A strong fascination for geographical information systems (GIS) led to my desire of combining traditional network data with geospatial data. I believe that adding geospatial characteristics to the network data improves the storytelling and communication aspects. Furthermore, I believe that global trade is a suitable subject for this network visualization purpose, as it could debouch in some interesting patterns.
My main source for inspiration was derived from flight network visualizations, such as Tuan Doan Nguyen’s project “Catching that flight: Visualizing social network with Network and Basemap”. I believe that Nguyen’s project, pictured above, provides a good story of the US air traffic, using node size to represent the number of flight route connections and color-coding to differentiate large and small airports.
Materials
- DATA: List of leading trade partners, Wikipedia article
- DATA: Country coordinates, Kaggle dataset
- SOFTWARE: Gephi
- SOFTWARE: Microsoft Excel
- SOFTWARE: Microsoft PowerPoint
METHOD
Data Preparation
Due to the data being scraped from Wikipedia, it required some structuring in order for Gephi to read the file as a network dataset. I created two CSV-files using Microsoft Excel. One holding the network nodes, i.e. all unique countries, and one holding the network edges, i.e. trade connections between the countries. I used the Excel function VLOOKUP in order to combine the coordinates from the Kaggle dataset with the unique countries in the nodes CSV-file. These two CSV-files, nodes.csv and edges.csv were imported to Gephi.
Gephi Work Process
After importing the data, I ran four statistical calculations; degree distribution, network diameter, graph density, and modularity. Running these measures provides insight, but more importantly for this project, it allows for the creation of our desired visualizations.
I created a standard appearance that was maintained throughout the lab. Node size corresponds to degree frequency, i.e. the number of times a country interferes with other countries, while node color is based on the modularity grouping, i.e. clusters with countries strongly connected to each other. The edge design consisted of curved lines for the first visualization and was later changed to straight lines. Labels were left out of the two first visualizations. The reasoning for the design variations will be explained further on.
RESULTS & INTERPRETATION
VISUALIZATION 1: Pre-Geospatial Data
The first visualization, pictured above (fig 1), portrays the network as run through the Yihan Fu algorithm. The geographical coordinate data has not yet been implemented to our model, hence, the behavior of nodes and edges represents that of more traditional network analysis. I chose the Yihan Fu algorithm for this first visualization, as I believe it clearly presents the network patterns. The labels have been left out in order to keep the visualization clean and maintain focus on the actual patterns. The three distinct clusters with a centered larger node could quite easily be assumed to be the leading import partner for many countries, as the edges represent the leading source of exports per country. The identity of the larger nodes could probably be guessed at this stage, but it will become even more evident in the visualizations below.
VISUALIZATION 2: Post-Geospatial Data
The Gephi plugin “Geolayout” allows for geospatial mapping of the network nodes using longitude and latitude information. The visualization above (fig 2) shows the same network as in visualization 1 (fig 1), but with the incorporation of geospatial information. The result: an understandable story begins to take form. Even though labels are still being left out, in order to maintain focus on node and edge behavior, the identity of the nodes is now more easily guessed (I want to point out that the European Union is included as one of the trade partners in this dataset). The edges have been transformed from curved to straight, as I believe this design enhances the discoverability of the trade patterns.
VISUALIZATION 3: World Map
Gephi comes with another plugin, called “Map of Countries”, which allows the designer to add an image of an actual world map behind the geospatial data (fig 3). This further unveils the story of our patterns, as we get an idea of the actual geographical location of the nodes, even without reading the labels, which now have been added. The map above is presented using the Mercator projection, which I found to be the most accurate projection in terms of node location. However, even though the plugin does a decent job estimating the geographical location of the nodes, the visualization can be further improved, as the distance error is rather severe for some countries. Furthermore, this plugin does not allow for a dark mode visualization, which I personally prefer based on aesthetic preferences.
VISUALIZATION 4: Global Trade Unveiled
The fourth and final visualization (fig 4) shows an improved version of visualization 3 (fig 3). In order to create this visualization, the Geolayout network, as pictured in visualization 2 (fig 2), was exported as a SVG-file. This file was added on top of a world map, using the third-party application Microsoft Powerpoint. The manual adjustments allowed for a more precise geospatial layout. Furthermore, this method allowed me to bring back the black background.
CONCLUSION & REFLECTION
This lab report demonstrated the process of unveiling a story behind a simple Wikipedia table, in this case, a list of trade between countries. I believe that this project successfully provides evidence of the power of network analysis and visualization, as it shows how interesting stories can be generated from a rather simple dataset.
Regarding future research, an interesting next step would be to create a network visualization of shipping routes, to investigate whether or not the routes correlate with the patterns unveiled in this lab.