Remember Saturday morning cartoons? I do. Before my parents woke up, I would pour a bowl of cereal, and tune in to watch the Justice League, Avengers, and X-Men stop the bad guys. Once the Marvel universe jumped to the movie screen, those heroes became even larger than life. They became people I would look up to and root for beyond my childhood years.
In the Marvel movies, characters were more prone to interact across teams and even the smaller characters had an impact on the plot. Main heroes, like Iron Man and Captain America, would end up saving the entire world – smaller roles, like Toad and Venom, would plight the heroes at every turn. After watching each movie, I began to see just how vast the Marvel universe really was and I wanted to show others just how well-knit the characters were.
I decided to conduct a network analysis on the Marvel universe to visualize the connections each character had. Through this objective, I came across three research questions that would guide my efforts and add more discoverable concepts.
- Which groups were more likely to cross paths with others?
- Do the strongest connections between characters remain within their modularity group?
- Which characters act more like a ‘lone wolf’?
To create the visualization I needed for this network analysis, I chose Gephi as the main tool to display this data. Gephi is a network visualization tool that can help researchers take hundreds of nodes and display the connections between them. It can help researchers identify a natural network and organize points based on community likeness and connections each node has.
I’m a major fan of the subreddit r/dataisbeautiful. While researching a design inspiration, I came across a circular layout of characters from The Office. The visualization showed the main characters as nodes and their interactions as the edges that cross throughout the circle.
Wide edges meant a stronger relationship between the two characters. The size of the nodes spoke to the weight each character had in the show. It also sort of represented their time on the show and in front of the camera. This visualization was exactly what I wanted to replicate to answer the questions I had about the main characters of the Marvel universe.
The circular layout makes for a more organized and digestible network visualization when comparing connections amongst individuals and communities. Each node is equidistant from the center which allows the viewer to easily find nodes to compare. If a ForceAtlas2 or Fruchterman layout was used, the display would force users’ eyes to dart across the visualization making it hard to see connections across communities.
Step 1: Finding the data
Luckily, the Gephi wiki page already had a network file dedicated to the Marvel universe. This dataset was available to download and open up directly in Gephi without any heavy data cleaning.
Step 2: Opening the file & appearance styling
The original unedited, undirected network file contained over 10,000 nodes and almost 200,000 edges. A file this big was way too large for me to work with or for a viewer to understand. My first step towards organizing the data was by color coating the communities Gephi identified. I ran a modularity algorithm and then used that data to partition the nodes off of color. This showed that there were several different communities throughout the Marvel universe and some crossed paths more than others.
Step 3: Node Sizing
The Office circular visualization sized the nodes based on the weight of connections each character had. I wanted to do the same sizing technique with the Marvel universe network. In Gephi, I chose to run an average degree algorithm and used the findings to size the nodes based on these degrees. I was then able to figure out the main characters of the Marvel universe and they were the most recognizable. The larger cast of nodes included Captain America, Thor, Wolverine, and, of course, Iron Man.
Step 4: Layout
I needed a circular layout to help me get closer to the visualization I was envisioning. At the time, I was not fully aware which layouts did what to help me organize the nodes in a circle. After trying out the ForceAtlas 2 and Fruchterman layout algorithms, I asked my professor for help. He directed me to download the Circular Layout plugin which granted me the ability to set my nodes in the circular design I had hoped for.
Within this layout, I was also able to order each node by its average connection weight and sort them by modularity. Therefore, a viewer could follow the nodes as they progressively got larger in each community and realize which communities had a stronger influence over the Marvel universe.
Step 5: Filtering
To finally isolate the most important nodes, I filtered most of them out based on the number of connections they had. After testing to see which ones would fit in the circular layout, I landed on choosing an edge range between 972 and 2189. Now, I could see the true inner circle of the Marvel universe and who was at the center of it.
This filtering also helped me identify the main teams that held most of the weight. That included the X-Men, Avengers, and the Fantastic Four. I was also able to identify some characters that worked on their own like Spiderman, Hulk, and Dr. Strange. Even though they were more of “lone wolves” those heroes did find themselves at the main table.
Step 6: Finalizing
After filtering the nodes and edges, I placed the circular visualization into Preview to export the display. In this export process, I added in the details that would help make my visualization similar to the one that inspired me. This involved turning on curved edge lines which helped make each line easier to follow from one node to the next. Finally, I had the product I had hoped for.
Which groups were more likely to cross paths with others?
I started to see this answer when I began coloring the nodes and edges based on modularity. Both the X-Men (purple) and Spiderman (blue) characters mainly remained in their own community while the rest of the communities crossed many paths with each other.
By filtering the nodes down even further, I could see that was still the case. However, the Fantastic Four began showing themselves as less likely to cross paths with others upon applying the filter. I believe that this team wasn’t being shown on the larger display because their edges were hidden by the others and that the number of characters on this team was very limited – only 4 as is in the name. Those that were shown as more likely to cross paths were the Avengers and the “lone wolf” characters like Thor and Spiderman.
Do the strongest connections between characters remain within their modularity group?
The stronger the connection, the thicker the edge line. However, strong connections didn’t have to be made within one group. Strong connections can happen both inside and outside of a group. For example, Captain America and Thor have an extremely strong connection although they aren’t identified as belonging to the same team. The same goes with Captain America and Vision.
Other modularity communities focus on strong connections inside themselves. This can be clearly seen with the Fantastic Four (pink) and X-Men (purple). Based on my own superhero knowledge, these groups have movies that focus more on the group dynamic and development rather than the story of an individual character.
Which characters act more like a ‘lone wolf’?
Although some nodes are largely based on the number of connections they have, those connections may not be strong or frequent. This display shows a character that doesn’t stick around with others for long but does interact with many others from the Marvel universe. If a node is large but has thin, light-colored edges leading into it, this character would be considered a ‘lone wolf’.
Wolverine, Spiderman, and Hulk exhibit these characteristics. Based on my own knowledge of superhero history, this is an accurate portrayal of these heroes. Each one tends to act in their movies and storylines on their own without much outside help from major teams like the Avengers or Fantastic Four. What is notable about Wolverine’s character in this network is that even though he is considered a part of the X-Men, he doesn’t have strong connections within his team. However, it is seen that all X-Men besides Wolverine have strong, frequent connections between them and rarely cross paths with outside characters.
After years of watching Marvel superheroes dance across different types of screens, I’ve been able to sort of recognize their connections and paths. This visualization directly legitimizes my assumptions about some characters while still bewildering me about others.
What keeps me perplexed is the constant struggle between Iron Man and Captain America. Even though they are usually both in the Avengers movies fighting side by side, they always seem to have tension between them. Does this network reflect that tension by identifying their nodes with different colors? Also, why are other characters known to be regularly part of Avengers movies split between two sides? Do some side with Iron Man and others with Captain America? The dynamic between those two characters still perplexes me.
This was my first time conducting an analysis on a vast network like the Marvel universe. It contains tons of characters and was fun to understand just where most teams connected and crossed paths. However, this practice leads to asking even more questions. Do secondary characters exhibit the same type of connections or do they become more distant from other groups? Do the main villains throughout the Marvel universe cross paths? Clearly, there is still much more I’d like to learn through this network analysis about my favorite childhood superheroes.