Since the first long-distance trade back in the 3000 BC, International trade has been a vital part of the global economy. Today, we live in a modern industrialized world which heavily depends on trade. For this network visualization lab, I analyzed the data on trade miscellaneous manufactures of metal among 80 countries in 1994, to find out which countries are most active and inactive, and which countries did they trade with.
Inspiration
The Leading Exporters and Their Trade with Each Other is very visually attractive. Countries were listed in a circular layout with an inner bar to illustrates the total value related to trade with all countries that occurred in 2013. Next to the first bar was the total value occurred with the displayed countries, with solid space represented exports and empty space represented imports. The varies colors were used to identify which region the countries traded with. However, it only included data from 23 countries. Adding more data to it might cause a really dense network in the center and the visualization would become less readable as a result.
The Network of World Trade in Goods, 2007 included more countries. Although it did not state the actual amount occurred, the size of the circle associated with each country still gave the audience a rough proportion. And as the first illustration, colors were used corresponding to different geographical regions.
Process
Collecting and Cleaning Dataset
The dataset I downloaded from CASOS was an XML file. OpenRefine was used to convert it to a CSV file as well as changing the column titles and order for Gephi to import and read properly. Then, I checked the node and edge tables in Gephi and realized there were some issues. Country names with a space such as New Zealand were mistakenly recognized as “New” and “Zealand”, so I merged those nodes manually in the node table. In addition to that, I copied the data from the id column and pasted it in the label column. I also added a column with their geographical information which was an important feature in both of my inspirations. After doing all these, I believed my data was ready for the next step.
Data Analysis
I then ran some of the statistics in Gephi to learn more about the patterns within my data. In general, I think the nodes in my data are very well connected. The average degree of my data was 21.8, with the lowest number of 4 and the highest of 67. The network diameter, stating the shortest distance between the farthest two notes, was 3. And the graph density was 0.316, which I believed was quite intensive for a network with 80 nodes.
Data visualization
After tried out a few different layouts, I decided to use Yifan Hu Proportional Layout for the final visualization. The most active counties were placed in the center while those with only 4 or 5 edges were located at the fringe. To emphasis this trend, I adjusted the node sizes and label font sizes according to their degree, so the larger the node, the more connected. Then, I set the node colors base on the geographical information which I entered earlier in the data table since I thought it made more sense than the clusters. Last but not least, I turned the curved lines straight as it was an undirected network, with a mixed color to indicate every node’s sources or targets.
Results and Reflection
From the final visualization, it is clear that Finland, Hungary, Slovenia, and Singapore are the most active countries in 1994 on miscellaneous manufactures of metal trading. While Fiji, Norway, and Jordan are the most inactive ones. Generally speaking, Europe is the most dominating region with over 30% nodes and more connected both within itself and to other regions comparatively.
There are still some flaws in my dataset. As I mentioned earlier, I manually merged some nodes in the node table, but as there was no append for the dataset, I might have made some mistakes in that process. For example, I merged the nodes Czech and Republic together, but there are in fact other countries that also have the word republic in their official name, such as the Republic of Kuwait. In addition to that, there was an abbreviation Mon. in the node table and I have no idea what country it refers to.
For further development, I think introducing the latitude and longitude data into the dataset could be interesting so that the nodes will be placed at their actual geographical location using Geo Layout, and the interregional and intraregional connections can be displayed in a more readable way. Besides, as some of the information is missing, for example, the data from the U.S., adding those data back and see how it might affect the ranking is also interesting. Another thing I would like to do is to import the illustration in Adobe Illustrator, since some of the labels are impossible to read as they might be too small or overlapping, and without the legends, it is a bit difficult for others to understand the illustration.
Reference
White J.G., Southgate E., Thompson J.N., and Brenner, S. (1986).The structure of the nervous system of the nematode C. Elegans. Phil. Trans. R. Soc., London 314, 1-340. Retrieved from http://cdg.columbia.edu/cdg/datasets.