Introduction
In this lab, I visualized data on trade miscellaneous manufactures of metal among 80 countries in 1994. The main purpose of my visualization work is to solve the following research questions:
- What does the network data look like when visualized?
- What are the relationships between trade entities?
- What are some trade entities that have the most links?
Materials
Dataset
Dataset used was retrieved from CASOS Network Analysis Data.
The files contain data on trade miscellaneous manufactures of metal among 80 countries in 1994. All countries with entries in the paper version of the Commodity Trade Statistics published by the United Nations were included, but for some countries the 1993 data (Austria, Seychelles, Bangladesh, Croatia, and Barbados) or 1995 data (South Africa and Ecuador) were used because they were not available for 1994. Countries which are not sovereign are excluded because additional economic data were not available: Faeroe Islands and Greenland, which belong to Denmark, and Macau (Portugal). Most missing countries are located in central Africa and the Middle East, or belong to the former USSR. The arcs represent imports by one country from another for the class of commodities designated as ‘ miscellaneous manufactures of metal’, which represents high technology products or heavy manufacture. The absolute value of imports (in 1,000 US$) is used but imports with values less than 1% of the country’s total imports were omitted.
In addition, several attributes of the countries were coded: their continent, their structural world-system position in 1994, their world system position in 1980 according to a previous analysis by Smith and White – see the reference below – and their Gross Domestic Product per capita in US$ in 1995 (Statistical Yearbook of the United Nations).
Reference:
W. de Nooy, A. Mrvar, & V. Batagelj (2004) Exploratory Social Network Analysis with Pajek. Cambridge: Cambridge University Press, Chapter 2. Retrieved from: https://sites.google.com/site/ucinetsoftware/datasets/worldtradeinmiscellaneousmanufacturesofmetal1994.
Tool
Open Refine was used to clean retrieved data and to parse XML file into 2 CSV files(node table and edge table.)
Gephi was used as the main data visualization tool.
Method&Process
I took a 3-step procedure in this lab:
- Retrieve and refine data.
- Research on design inspirations and references
- Visualize dataset
Step 1: Retrieve and refine data
The original dataset retrieved from CASOS was in XML form. Since Gephi would only understand parsed node and edge data, I imported the original XML file into Open Refine to convert it into CSV files. Eventually, I exported 2 files listed below:
Before moving onto the next step, I imported those files into Gephi so that I can have an overview of the visualized data. The automatic visualization generated by Gephis looks like Pic 1. As you can see below, it is quite hard to read. In addition, there is no text information providing necessary information on country names.
Step 2: Research on design inspirations
3 network visualization works inspired my procedures on next step:
#dhiha5 Digital Humanities at Deutsches Historisches Institut Paris showed a great example of visualizing in and out links, which is extremely useful for my data set. As you can see in the next step, I used my own way to differentiate in and out edges.
The following map of digital humanities on Twitter used clusters and labels to present key information. While I finally used another way to present labels, it has shown me a good example of how to present labels on maps with loads of edges.
The last map that inspired me is Martin Grandjean’s example on “how to make a network graph”. The author used pic 4 to show how sizes work to represent information. However, what really inspired me was the use of color in this graph. In pic 4, the color palettes of both nodes and edges are consistent and hence could reduce the cognitive load of reading the graph.
Step 3: Visualize dataset
Based on lessons learned from previous works, I edited the shape, color, and texts of the pre-generated data visualization:
- Shape: I used ForceAtlas 2 pre-set layout, because I want the nodes of highest degree to be the most emphasized.
- I used yellow for lowest degreed nodes and dark red for highest degreed nodes.
- I also resized nodes according to degree. The higher a node’s degree is the bigger the node would be.
My final data visualization is below:
Reflection:
While the visualization may present some information and have solved part of my research question, outcome of this lab have major limitations:
Quality of data:
While data used in this lab was cleaned, some parts of it is still not understandable to me. There are 3 nodes call:”of”, “reunion” and “lux”, which I kept in the cleaned dataset for 2 reasons:
- Those nodes have massive connections to other nodes
- Those nodes might represent meaning to people who are familiar with world trade.
Unable to export PNG from Gephi:
Another issue I faced and failed to solve was image exportation. If you observe pic 5 carefully, you may find out that the image is not of the most ideal resolution, because it is a screenshot. Through a quick search on the internet, I found out it might have triggered some bugs of Gephi, which has not been fixed yet.
Overall, the Network data visualization lab helped to understand how to use Gephi to explore massive data, to understand relationships between nodes and edges. I wish that I would soon have a chance to work on another network visualization project and cover the limitations pointed out above.