Marvel Superhero Social Network: A Gephi Visualization


Visualization

Introduction  

Network analysis is an important method for identifying and exploring relations between a group of institutions, individuals, or objects. Rather than simply exposing how a small number of variables react to each other, this form of analysis can allow a researcher to examine an entire community of data points showing power structures (i.e. groupings and hierarchy), information or resource flow (i.e. directionality), and other network characteristics such as density and diameter. This report utilizes network analysis by focusing on the visualization of a social network using Gephi.

The data set used for this study’s visualization is of Marvel Comics superheroes and includes instances where two or more superheroes appeared together in the same issue. After inputting and manipulating the data set in Gephi, several major social clusters of superheroes were identified. A segment of the resulting visualization can be found in the Results & Discussion section later in this report. Several different applications of this visualization for marketing and pop culture research are also discussed in that section.

 

Materials

  1. Gephi: Free network visualization software available for download at https://gephi.org/.
  2. GitHub: Source of data used in this report. This site houses Git repositories for shared computer code and other data. The data set used in this report comes from the wiki of the Gephi repository in the “Datasets” manual (found at https://github.com/gephi/gephi/wiki/Datasets). The data set is titled “The Marvel Social Network” and is in the form of a GEPHI file. This file was constructed by Cesc Rosselló, Ricardo Alberich, and Joe Miro from the University of the Balearic Islands and the data was collected by Infochimps and transformed and enhanced by Kai Chang.

 

Methods

Data Collection & Preparation

Marvel superhero social network data was collected from GitHub to initiate the process of this visualization project. In order to guide my visualization creation and network interpretation of the Marvel Comics superhero data, three network visualizations were gathered. These visualizations were interpreted by me before I began my visualization process. All three of these visualizations had undirected edges and had nodes that depicted books or fictional characters in order to stay within the theme of my visualization project.

 

Example Visualization 1

The following network visualization depicts books about U.S. politics published around the 2014 U.S. presidential election and sold on Amazon.com:

(Source: http://leogulus-champ.blogspot.com/2014/12/books-about-us-politics-network.html)

To begin my research, I sought out a simple Gephi network visualization of books that depicted clear clustering. In the above visualization, red nodes signify conservative books, blue nodes signify liberal books, and green nodes signify neutral or nonpartisan books. Edges signify frequent co-purchasing of the two books that they connect, and the size of each node signifies the number of books that are co-purchased with it.

This visualization reveals, unsurprisingly, that conservative books are clustered with other conservative books and liberal books are clustered with other liberal books, with very few outliers. This visualization utilizes color and space to clearly depict two major clusters, which makes it a great foundation to explore more complicated network structures.

 

Example Visualization 2

The visualization below depicts a social network of characters from Episodes I through VI of the Star Wars movies:

(Source: http://evelinag.com/blog/2015/12-15-star-wars-social-network/index.html#.WgQBosanFPa)

I wanted my next visualization to depict a simple social network of characters from a popular series that I am familiar with. In the picture above, nodes represent Star Wars characters while edges represent scenes in which characters speak together. Line thickness represents the number of times two characters speak together and the size of each node represents the number of scenes in which a character appears. It is important to point out here that the designer has separated Anakin and Darth Vader into two separate nodes, noting that “this distinction is important to the story.”

This visualization makes relationships somewhat easy to identify by the distance between any two nodes: see how far apart Darth Vader and Jar Jar are from each other as opposed to how close Jar Jar and Padme are to each other. It is also possible to see a split between characters from the three original movies and the three prequel movies around the nodes for Obi-Wan and R2-D2, characters that appear in both Star Wars movie trilogies. However, the relationships are obscured by the fact that colors are not used to highlight clustering and the edges are simply too short: a gravity adjustment, at the very least, would help to separate the nodes more to enhance a user’s interpretation of clustering.

 

Example Visualization 3

The next visualization depicts a social network of characters from William Shakespeare’s play Antony and Cleopatra:

(Source: http://www.martingrandjean.ch/network-visualization-shakespeare/)

In the above visualization, nodes represent characters and edges represent scenes in which the connected characters appear in together. Size and color intensity represent weighted degree (i.e. the number of edges a node has), with a larger size and lighter color signifying a large number of edges. This visualization labels clusters more clearly than the preceding one and utilizes color effectively to highlight important characters. Different colors, however, and a manipulation of gravity to highlight clusters more clearly would make it even easier to identify the clusters, but the visualization works well enough as is. It is important to note here that this visualization is part of a collection of networks that depicts characters from several of Shakespeare’s works; each network has a different color, so different colors within a single network are not used. This network is closest to what I wanted to depict in my Gephi visualization, with the addition of different colors and a greater manipulation of gravity.

 

Visualization Creation

Once I completed my analysis of the visualization explored earlier in this report, I downloaded the “The Marvel Social Network” GEPHI file from GitHub. I then imported the file into Gephi and began my visualization process. First, I ran a few statistics including average degree, network diameter, graph density, and modularity to have them ready for future reference. Next, I ran the Gephi layout algorithm called Force Atlas 2 in order to expose clusters; I stopped the processing of this once I was satisfied with the effect. I then manipulated node size and edge distance proportions and changed node and edge colors, label lengths, and fonts to enhance cluster discoverability (some of this was done on the main screen and some in the preview section). Next, I produced an image of my visualization from the preview screen. I was unable to successfully change the gravity because of the size of file I was using and the processing power of system I was operating Gephi on, so my final visualization of the whole network is provided along with here a zoomed-in picture of an important segment of the network in the Results and Discussion section that follows.

 

Results & Discussion

The picture below depicts the resulting Gephi visualization of the Marvel superhero social network data used for this study:

Below is a zoomed-in segment of the above network, with a more zoomed-in version below it:

 

The resulting visualization above allows a user to see clusters of superheroes within the Marvel Comics landscape. Nodes depict superheroes and edges depict co-occurrences of superheroes within a single Marvel Comics issue. The size of a node signifies the number of nodes attached to it. Different colors help to identify clusters, however, a manipulation of gravity to separate clusters would make the resulting visualization easier to interpret.

The following stats were collected through Gephi for this network:

Average Degree: 34.027

Network Diameter: 7

Graph Density: 0.003

Modularity: 0.49

Average Path Length: 2.889

 

The stats and major clusters found in this visualization are not surprising by themselves, since an avid reader of Marvel Comics would likely have a general idea of the scope of Marvel’s characters and which characters appear together frequently (note how Wolverine and Iceman are together, as members of the X-Men superhero team, and Mr. Fantastic and the Human Torch are together, as members of the Fantastic Four). The average degree makes sense given the sheer number of comics that Marvel has produced and the other stats reflect the data set’s large size. However, the apparent clusters can reveal some important insights for marketers or pop culture researchers. For marketers, clusters of superheroes could be compared to the co-purchasing behavior and sales figures of particularly issues to see if existing clusters should be edited to increase revenues. Visible clusters of heroes could also help film producers better layout movies when a series of comic-to-film adaptions are in the works. A pop culture researcher may be interested in looking into the diversity of certain clusters and comparing that to the writers and artists that worked on those characters (e.g. their identities, how much Marvel paid the writers, their connections to other writers and artists within Marvel, etc.). A network visualization such as this one might be well-suited for a dashboard that also includes bar graphs that depict sales and demographic data for specific clusters – an opportunity for  a future study.

 

Moving Forward

Gephi is a wonderful tool for visualizing network data, but it comes with a bit of a learning curve. Data must be cleaned up more so than data used for a visualization program such as Tableau, since Gephi does not allow for much data cleanup within the software itself. Features within Gephi require a user to understand the fundamentals of network theory in order to properly use them since the software does not provide self-explanatory labels. Nonetheless, network interpretation is not taught as early in primary and secondary education as are other graph interpretations, such as for bar and line graphs, so the steeper learning curve for Gephi is not necessarily an avoidable burden.

As for my data and resulting visualization, working on a system with greater processing power would have made the manipulation of such a large data set easier. Changing gravity would enhance the understandability of the network and make it more useful for those less familiar with Marvel Comics by separating and highlighting clusters. Furthermore, as discussed earlier, future studies that depict sales and demographic data for key clusters of Marvel Comics superheroes, in bar and line graphs, would lend well to a dashboard creation.