Introduction
The Center for Computational Analysis so Social and Organizational Systems (CASOS) is a research center at Carnegie Mellon University that aims to “foster multidisciplinary research”, evolve the analysis and communication of network data, and develop tools to further research and applications in these areas. I examined their world trade data of miscellaneous metal manufacturers of 80 countries in 1994 primarily from information published by the United Nations. I chose this dataset because something I find challenging about interpreting network diagrams is that the focus is on the relationship patterns between nodes, and not the location of the node itself. Since I am trained to look at things in plan, I thought this would be a good exercise to deviate from the map and focus on network patterns.
Inspiration
Visual Complexity: Social Networks
I felt this was an effective use in colors to intuitively illustrate different clusters within this LinkedIn Network. Even though I don’t know exactly what the colors represent I can deduce that they are probably different industries that Manuel Lima has connections in. It’s easy to see that there is a clear divide between the green and pink clusters but that there is some kind of relationship between the two. On the other hand, the relationship between the blue and orange clusters is a little more complicated. While there are distinct only-blue and only-orange clusters, there is a significant cluster where they overlap. This is also useful in telling me that the orange and green clusters only have overlap through the blue cluster. There is a lot one can interpret from this network graph given how few labels are included.
Visual Complexity Business Networks
Having experience with drafting, the power of lineweights quickly becomes clear, and I think the mental map of bolder, thicker, or darker lines translates to importance. This visualization takes a look at analyzing donors and receivers and linking the funding between these two groups within Reducing Emissions from Deforestation and forest Degradation. In the description of this visualization they mention that donor relationships are coded in blue, while recipients are coded in red. While I think that was a good choice, it is hard to tell at this scale the direction of the lines, since this net work is a directed network. I think this can be improved by conveying that better.
Materials
After downloading my dataset from CASOS, I brought it into OpenRefine to convert it from an XML file to a CSV. I also had to drop and re-name some columns in order for the data to be properly read in Gephi. Gephi is an open source software to perform network analysis and create visualizations. It will run various statistical calculations to allow users to discover underlying patterns within the data.
Methods & Process
1. Cleaning Data
After importing my data into Gephi and taking a look at the nodes, I noticed several countries did not import properly so names of countries with two words (e.g. Hong Kong, El Salvador, New Zealand) were read in as separate entities. These were easy to detect and I felt comfortable merging them into a single column. However, there were several nodes I felt less sure about merging (e.g. Mon., Moldava., Rep., Of., Lux.). I spent some time researching the sources that CASOS cited for a clean list of countries included in this study, as well as checking the original website this dataset was published on that had an updated table. Unfortunately I couldn’t find anything that clarified these countries so I recognize that there are some data flaws in the visualization I produced.
2. Selecting Layout & Running Statistical Analysis
I ran calculations on average degree, network diameter, graph density, and modularity, which not only populated calculations for each node in my edge table, but also allows me to change the appearance of the network graph based on these measures.
3. Adjusting Appearance for Visual Communication & Clarity
Using the graphs shown above, I explored different settings in the appearance palette, including adjusting the size and color of nodes, and adjusting the settings within the layout for better visual clarity. Additionally, I adjusted settings in the preview panel for more nuanced rendered effects.
Results
This network graph on world trade of various metal manufacturers per country uses the Yifan Hu layout since it is network on the larger size. I set the size of nodes to be scaled proportionally to degree and the colors set based on a modularity calculation to a resolution of 1. This allows us to see three distinct and roughly even clusters, and the size of nodes within each cluster have the highest degree within the network. Finally I adjusted the lightweights to scale proportionally based on weight, resetting the high and low values as well as the opacity, to ensure that the various lines would not be too overwhelming.
Reflection
I think the network graph does an effective job at showing the countries that have the highest degree and the connections to the rest of the network. The Yifan Hu layout has all of these nodes centrally located with the less connected nodes further out. I also took a look at the Frutcherman Reingold layout, which oriented all the nodes in a circle, mimicking a globe-like layout that I thought was also effective.
Ultimately I felt like the Yifan Hu layout was better because the irregular distances did a better job at illustrating the connections between nodes as made more sense with coloring by cluster. If I were to do this lab again, I would look into importing latitudes and longitudes for these countries to see how this network map would look anchored to a world map. I began doing this initially, but the several columns that I could not resolve during the data cleaning stage would have thrown off this map as well.