For my second project in the Data Visualization class I worked with a social networks dataset, and chose one that pertained to the network of characters in a famous novel. Though this literary work is well known I have not yet read it. I was curious about the central characters of the book, who they were, and how well connected these central people were. A goal was to tease out the characters that were not central to the story, or not as well connected to other characters.
Three Visualizations:
http://www.visualcomplexity.com/vc/project_details.cfm?id=996&index=996&domain=
The first visualization that informed my design was a one depicting the daily connectivity of jazz artists, called Linked Jazz. The way various types of people were linked, and the seamless layout were vary appealing. I liked that while there was a lot of information available on each artist it was in a way that was not overwhelming or confusing. Though this example is much more complex than mine, seeing the depths of what can be done was useful as inspiration.
http://www-personal.umich.edu/~mejn/networks/lesmis.gif
This visualization from a web page called Gallery of Network Images. It has the most obvious ties to my own data as one of the examples on this page used the same data set as I did – and it is entitled Les Miserables. I found it helpful to see how the creators of this graph approached the visualization. Their visualization was successful in its clarity, and the colors used were distinct and effective in showing different clusters. Through looking at this I decided I wanted my visualization to be more streamlined.
http://36trucs.social-computing.com
The last visualization is from 36trucs and it shows a community of people with “shared personal goals”. I viewed this graph as it was more modest in size (similar to mine). It was straight forward while still conveying the information to a viewer. While simple, the simplicity aided in the overall effect, and the message of the project is evident. This is a “less is more” example to remind myself that keeping things pared down does not necessarily mean being a graph will be unclear.
Materials Used:
The dataset I used was called “Les Miserables”, sourced from the Gephi GitHub page. The subjects of other datasets on the GitHub page were biological networks, infrastructure networks, and more, which are available to the public online. The data I chose came from the Social Networks section of the site, and was from the the book “Les Miserables”. The data is described on the site as: a weighted network of characters in the novel Les Miserables.
Gephi was the software utilized for the visualizations in this lab. In this software a spreadsheet (such as a CSV), can be accessed by Gephi, and then data visualizations can be made. In my case a GML file was used, and was ready for easy use in the software.
Lecture slides were referred to, in specific those from 10/25 (of particular note were the example visualizations in slides 6-11).
The class reading that I mainly referenced was Krempel’s “Network Visualization”, specifically the section Visual Layers of Network Attributes was very useful. Also helpful was referring to the Visual Complexity site to see examples of how more advanced projects were executed.
Methods Used to Create Visualizations:
I downloaded the dataset from the Gephi GitHub page, saved as a GML file. Once the data was downloaded, I could easily open it in Gephi, where a report is available with the nodes and edges. After this, in the Overview tab, the initial graph appears. In the “Workspace” I began to explore the features that would help to build my visualization, including edge weight scale, zoom to size, and showing node labels, to name a few. Then I chose to run Force Atlas to layout the graph. I increased the repulsion strength, and adjusted the node size.
The stats I ran included modularity and degree, and attempted clustering. Finally, I used filters. To isolate the characters I used In-Degree Range, which I started at 3. I did this to eliminate those characters with less than 3 incoming edges. Since I decided to keep the labels of the nodes present, I did some cleanup work to make sure none of the names were overlapping.
Results:
It was interesting to see that filtering the characters with less than 3 incoming edges greatly changed the look of my visualization; reduced the number of people, which indicates there were many characters with little connectivity. Those with titles such as “Child 1” were no longer in the graph, which was one of my goals. What remained are those who are more internal to the story line.
Future Directions:
A possibility for future work would be to focus on data from another well known literary book with many characters (say, for example, The Brothers Karamazov by Dostoyevsky) and also produce a visualization for that work, in a comparable manner as done here. Of course, if the data is available. To display the graphs together would tell a story about connectivity of characters in literature from different sources and cultures.