Introduction
As I was planning to create network visualizations of popular actresses and films they appear in together for my final Information Visualization project, I decided to use a dataset containing similar information for this Gephi lab. The dataset I used lists the coappearance of characters in the novel Les Misérables, and was provided by Donald E. Knuth in a Gephi-ready format (GML). I was primarily interested in locating the strongest relationships in the network, and exploring the ways in which the Gephi software displays this kind of information.
Inspiration
In order to better understand networks, I purposefully sought out examples of three of the visualization types discussed in class: force-directed, arc diagram, and radial diagram. The examples shown below initially intrigued me because of their aesthetic value, but they also clearly convey information about elements and the connections between them.
Though the visualization below is meant to be presented in Flash, the still image is still striking. The graph displays friends shared by you and one of your Facebook friends. Node color indicates gender, and there is a dropdown box that allows users to select a different friend. The result is a user-friendly visualization that is easy to interpret.
Source: http://mashable.com/2009/08/21/gorgeous-facebook-visualizations/#Ll4ljBSxXEq3
The arc diagram below is the result of a search for “data” in The New York Times articles from the past 30 days, as retrieved by a tool called NYTimes Writes in May 2011. Each node is a tag attached to one or more of the articles retrieved in the search results. The width of each arc indicates how often the tags appear together. Since the size of each node indicates the number of times a tag appears, the order of the nodes also stacks the tags in terms of popularity. The vertical arrangement of the nodes makes the name of each node easy to find and read, which is not always the case when the nodes are arranged horizontally.
Source: https://flowingdata.com/2011/05/24/exploring-nyt-authorship-and-topics/
The following radial diagram displays common food additives and links them to the supermarket aisles (food type) where they are found. The original visualization (the image below is a screenshot) is highly interactive, as any item can be selected to see where an ingredient can be found or what ingredients can be found in a given aisle. Hovering over any node provides information about that ingredient or aisle, and hovering over any edge reveals exact details about the connection (for example, “41% of Drinks contain Citric acid”). Though an initial time investment is required to understand the visualization, it yields a wealth of information.
Source: graphics.wsj.com/food-additives-ingredients/
Methods and Discussion
I used Gephi, an open-source software package of French origin, to create my network visualizations. After uploading my GML file into Gephi, I noticed that the network was Directed, which I thought odd for this type of data – if Val Jean is with Fantine, Fantine is also with Val Jean, so logically it should be Undirected. To rectify this, I exported the edge table as a CSV file, changed the Type column to Undirected for all edges, and uploaded it into a new project in Gephi. I suddenly had far too many nodes and no labels, so I also exported the node table from the original GML file as a CSV file, and uploaded it in a new project, followed by my edited edge table. This failed to fix the problem, so I reverted to the original GML file so that I could begin creating visualizations.
I first employed the Force Atlas 2 layout and let it run on the network for a couple of minutes. After it had expanded a bit, I used the Ranking function on the nodes to make each node’s size correspond to its degree. I also experimented with the Ranking function to make each node’s color correspond to its degree, out-degree, and in-degree.
However, since degree was already indicated by node size, I wanted color to indicate a different node property. In the Data Lab tab I added a column for gender and filled in ‘male’ or ‘female’ as appropriate (note that for some of the minor characters I made educated guesses regarding gender, as I had no time to investigate this). I then used the Partition function on the nodes, with gender as the parameter, and colored the female nodes red and the male nodes blue. Already the visualization shows that there are far more male characters than female, and that the males have larger nodes.
I experimented with several other layouts, and finally installed the No Overlap plugin in order to prevent the nodes from overlapping.
After creating this force-directed visualization, I searched in vain for a layout that could create an arc diagram. Since there didn’t appear to be any such layout, I filtered the nodes by in-degree (range: 7-32) to make the number of nodes more manageable (reasoning that many of the less connected nodes were less important characters) and manually moved the nodes into a vertical line from largest to smallest. I could not figure out how to change the edge shape to arcs; this may not be possible because the network was still characterized as Directed. Therefore, I simply hid the edges for the time being. Even without the edges, the diagram shows clearly that even among the major characters, male nodes have much higher degrees than female nodes.
While searching for an arc diagram layout, I had found a Circular Layout plugin for radial diagrams. I installed this and used it on my network. I experimented with the ‘Order Nodes by’ property and settled on ordering them by degree.
Future Directions
The most important next step would be to investigate why the network was characterized as Directed, and if/how it could be properly changed to Undirected. I would also research further into Gephi’s capabilities and find out if there is a layout for arc diagrams. None of the visualizations make it easy to identify individual relationships. The radial diagram in particular is rather dense; it would be easier to interpret if it was modeled on the food additives diagram mentioned above. Such a visualization would allow the user to select a character and see only that character’s connections.
In future I think it would be wiser to work only with my own dataset. I experimented with creating my own network dataset but did not have the time to complete it. Because the Les Misérables dataset was already a GML file, I was unable to examine it beforehand. If I had been working with my own data, or used a dataset that I could preview, I would have been more comfortable manipulating the data and might have added some more node properties in advance.