Introduction
For this Gephi lab, I chose to create a network visualization of character relationships in David Lynch’s Twin Peaks. After extensive searching, it became clear that no such dataset exists. I was forced to create my own dataset of character relationships, entering 401 lines of data into the edge table. This table featured a source column, a target column, and a type column in which I dictated whether a relationship was directed or undirected. My source and target relationships were based on scene co-occurrences. Since one character cannot be in a scene with another character without the inverse also being true, I was able to mark every relationship as undirected. I also created a node table, where each character’s name and ID were listed. I was disappointed that I was unable to find any information on total screen time for each character, as this would have been a good way to assign weight to each edge. Unfortunately, this network is unweighted.
An important note about this dataset: Laura Palmer, a character around whom the entire series is focused, is never shown alive. I designated her relationships based upon those who were in the presence of her corpse or who interacted with her “ghost”. I built this dataset with the hope that visualizing these relationships may spread some light on the centrality of certain characters and the peripheral nature of others in unexpected and useful ways.
Influential Visualizations
The first visualization that influenced my own was this network graph of Star Wars characters.
Each node represents a character and each edge represents two characters speaking in the same scene, which was created using the movie’s screenplay. The thickness of an edge represents the frequency with which two characters co-occur in scenes, and the node size represents how often a given character speaks. I was attracted to the simplicity of this network, though I did find that some of its effectiveness was diminished slightly by how compact the center nodes are. It is a bit difficult to see which characters connect with whom, but overall I think it is a fairly successful visualization.
The second visualization that I considered was a network graph of the character relationships in Friends.
While I knew that this visualization would not work for my data, as I was dealing with far too many characters (close to 80 overall), I found it to be a unique and interesting way of handling relationship visualization among a cast of only six characters. The designer created six individual visualizations, one for each character. The five remaining characters were then listed in descending order of “closeness”, based upon number of shared plot dynamics. This value seems rather inexact to calculate, and the added numbers and dashed lines that the designer used to compensate for these complex methods of determining shared plot dynamics seemed fairly ineffective. However, I was attracted to the idea of creating unique visualizations for each character and felt that this helped to narrow the viewer’s focus onto each character individually and thus to more closely examine their relationships.
The third and visualization that I explored was the Graph of Thrones based upon character relationships in Game of Thrones.
In this network, node size in based upon betweenness centrality and edge thickness is based upon the number of interactions between characters. The coloring of this network is somewhat misleading. At first glance, I assumed that the colors were based upon family relations, as house affiliation plays a large role in the show. However, upon further reading I learned that the designer used the walktrap method to determine communities, or clusters, in this graph. While I am not entirely clear on how this method works, it seems quite complex and, as stated above, I worry that the coloring of sections of the network in this way can be misleading.
Materials
As mentioned above, I created my own datasets for this network. This was done with help from the Twin Peaks Wiki page, as well as my own calculations of scene co-occurrence based on a recent watching of the series. I used a simple Google spreadsheet to create my node and edge tables. Since I created the data myself I did not need to use OpenRefine to clean it, and the networks were made using Gephi.
Methods & Results
After importing my edge and node tables and running statistics on my data, I tested out a number of layouts in Gephi. Since my edges were unweighted, it made my visualizations arguably a bit weak. I was unhappy with the output of many of the layouts that I tested. The absence of a weight factor for my edges also meant that I was unable to glean modularity statistics for my data, which made clustering impossible. I attempted to group characters into their respective family units, in the hope that this would enable clustering. This resulted in the following graph (made with a combination of the Yifan Hu and Noverlap layouts, and with the node color attribute of “family” assigned, and node size attribute based upon average degree). It became immediately apparent that this type of filtering would not work, since some characters do not have an associated family unit. This meant that characters without families, represented here in magenta, appear to be one single family unit. I then attempted to assign node color, as well as node size, by average degree, which resulted in a gradient. The most connected nodes were filled with the darkest color, and the least connected nodes were filled with the lightest color (in this case, white). While not perfect, I felt that this was the most successful, and the most accurate, at conveying character relationships. As you can see from both sets of images, characters that have tenuous connections to others are located at the periphery of the network, with the two most central characters, Dale Cooper and Harry S. Truman, located in the center of the network with the largest and darkest nodes.
Future Directions
I believe the most effective way to improve this visualization would be through enhancement of the dataset that I created. Clustering would have been made this network far more successful. I had hoped that my network would look similar to the Game of Thrones network referenced above, where colored clusters helped to elucidate character relationships. Finding either a calculation of total screen time per character, or perhaps the frequency of shared screen time between characters, would allow for the inclusion of edge weight. This would enable the calculation of modularity statistics and the creation of clusters. I was also unable to download the JavaScript plugin, which meant that my network is simply a static image file. I believe adding an interactive component to the visualization, where users could click on a node and highlight the relationships that are unique to that character, would be extremely helpful in such a dense network. Overall, I am fairly happy with the results of this network, given that I created the dataset myself, but the inclusion of edge weights and a JavaScript plugin would greatly enhance this visualization.