A Comic Co-Appearance Web of Marvel Comics Characters


Networks, Visualization

Cameron Dudzisz-Pounds

Introduction

One of the highlights of the student work posters on the Information School floors in the Manhattan building is the web of Marvel super heroes. I had initially wanted to use Gephi to create a bipartite graph like the one above as a complement to better show which characters had the most comic appearances, and which issues had the most characters appearing in them. However, I was not able to do so, so made a more standard web of character relations instead.

Screenshot of the full network. To zoom in, download below.

Materials & Methods

I used the Marvel Social Network dataset on the Gephi GitHub (direct download here) constructed by Cesc Rosselló, Ricardo Alberich, and Joe Miro from the University of the Balearic Islands. and transformed and collected by Kai Chang and InfoChimps. To create the network graphs, I used the desktop app version of Gephi 0.9, and imported the data set directly from the .zip file. From there, I copied the id data (the name of the character or the issue of the comic) into a new Name column and the Label column, which indicated if the row was a character (“hero”) or comic run and issue (“comic”) into a new Type column to properly manipulate the data. I then ran degree and modularity statistics (Bottom left) to determine node degrees and character groupings, which are reflected in the edge colors in the final chart, which I tried to the best of my ability to code with colors associated with those characters or group with the exception of the X-men, who got hot pink mostly on dint of me running out of colors and them not having a particularly consistent color branding as far as I’m aware. I tried to run network diameter and graph density as well, but these both crashed Gephi when I tried to run them.

A section of the data after transformation and statistics.
Statistics window, with Average Degree and Modularity already run.

I first attempted to create a bipartite graph by following this tutorial from the social-dynamics blog, but Gephi would either crash when attempting to run it, or the plugin would refuse to let me define the left and right matrix (and then sometimes crash anyway.) I looked elsewhere for other tutorials, such as this question from the Stack overflow forums, but did not have any success. I eventually gave up and decided to make a character relation web instead.

I used the ForceAtlus2 layout algorithm for the initial clustering, and then ran a pass of NoOverlap and Label Adjust to try to clean up some of the overlapping chaos. However, due the very large number of nodes in the middle (characters and issues that shared and used characters from multiple modularity,) I also had to stop it from displaying the labels of any character or comic with less than 5 degrees of connection to attain any degree of legibility while also keeping the entire graph on the screen.

The first iteration after running layout algorithms.

After manual adjusting of nodes to give room to labels (for whatever reason yet in a way totally fitting with his character, Iron Man/Tony Stark really wanted to cover up every character on his section), I exported the graph into PDF.

Results & Interpretation

While some raw statistics can easily be seen from the base chart, such as how Spider-man is the most connected character by number of edges with over 1600 degrees, the network graph also reveals some interesting patterns that can’t be easily teased out of a simple table, such as how characters crossover with different groups (or don’t.) For example, while Thor and Spider-man both frequently appear with other groups, as can be seen by the edges with combined colors, they also have a fairly large group of characters that only show up when they do. You aren’t going to get May Parker in a comic where Spider-man doesn’t appear. Captain America has many crossovers, but also very commonly works with the Nicholas Fury & Black Widow group, as shown by the fan of blue-green in his connections.

Thor has a large cast of characters that appear with him very frequently, but rarely or almost never without him.
Heimdall clearly needs to get out more.

In the center, in a group I affectionately call the “Marvel Hairball,” are characters that frequently cross groupings with only mild biases towards one group or another, but also have few characters who only appear with them- in contrast to Spider-Man or Captain America, who have both a cast of characters who are only in “their” comics and frequently mingle with other groups. Near the very center of this clumping is She-Hulk.

Unlike Heimdall, She-Hulk can show up anywhere but has few characters to call her own.

Reflection

My first major experience with Gephi was insightful and it was fun to watch the layout algorithms do their thing (when they weren’t crashing,) but I am still disappointed I was not able to get the bipartite graph I wanted to work. I was curious about which comics had the most characters appearing, but I could only find that information in the table* for now, which is much less visually satisfying at a glance than a network graph. I could have also found some supplementary datasets to help round out this knowledge, such as a character’s role (Hero, villain, and supporting characters are not differentiated) and how old they are- longer-running characters like Captain American and the Fantastic Four are going to have more issues and connections than a newer character, though it might be interesting to see how much the age of a character in terms of their first appearance matters, or which characters are the most “isolated,” i.e. appearing in the highest amount of books with the lowest number of characters.

With some more time and familiarity with the program, I could also have done a better job cleaning up the data display, as characters were not always consistently named in the original data, often with parts of their name cut off, sometimes to such a degree that it becomes confusing. For instance, I didn’t realize until writing this report that “Hawk” is supposed to be Hawkeye/Clint Barton. I should also have looked at the graph on different monitors, as the blue I used for the Fantastic 4 and Captain America groups looked distinct on the monitor I worked on but is nearly indistinguishable in others, resulting in a large mass of blue in the middle that is difficult to differentiate.

*(for the curious, this is Infinity War #3, with a whopping 91 characters.)