Marvel Hero Social Network – Gephi


Lab Reports, Networks

Introduction

As everyone’s childhood, superheroes were also a major part of my childhood. I was fascinated by Spiderman and always dreamt of becoming one when I grow up. I’m pretty sure it’s every child’s dream to become his favorite superhero one day. Fast-forward to November 2018, I’m still mesmerized by superheroes, but the only difference is that I don’t dream of becoming one now.

When I talk of superheroes, the first thing that strikes to me is the Marvel Cinematic Universe. It’s been more than a decade since the first Marvel superhero movie – Iron Man came out and till now they have released 20 movies featuring different superheroes. Since 2008, Marvel has built a fanbase of millions.

Being one in those millions, I always keep in mind that I’m constantly updated with the latest content related to the Marvel Cinematic Universe. While doing that I came across a dataset based on Marvel social network which included occasions where two or more superheroes appeared together in the same franchise. In order to do a network visualization and analysis of the data set, I used Gephi.

By doing a network visualization of this dataset, I wanted to map out large clusters of superheroes who appeared in the same franchise.

 

Inspiration/Critique

After I came across this dataset, I wanted to see the type of network visualizations that are already available related to the superhero datasets. This helped me in making my visualization better and understandable. Some of the examples that I liked were:

  1. Pierre Gutierrez’s Marvel Social Graph Analysis

Source: https://blog.dataiku.com/2015/05/19/marvel-social-graph-analysis

I came across this visualization while going through an article on Dataiku. My main idea was inspired by this visualization. I like the way it clearly bifurcates different movie characters and represents them in different colors. It looks clean from a design point of view and can be understood easily.

2. Félix Luginbühl’s Social Network of the Marvel Cinematic Universe

Source: http://felixluginbuhl.com/network/

While I was looking for inspirations, this visualization caught my attention. It is such that it has the characters of the Marvel movies randomly scattered and when you click on a character, it tells you which movies you can find that character in. It also has some user controls such as zoom in and zooms out for the users. I really liked the way this visualization was created.

 

Materials

  1. The Marvel Social Network Gephi file – This network of superheroes was constructed by Cesc Rosselló, Ricardo Alberich, and Joe Miro from the University of the Balearic Islands. The data was collected by Infochimps and transformed and enhanced by Kai Chang.
  2. Gephi – It is an open-source software for network visualization and analysis. It helps data analysts to intuitively reveal patterns and trends, highlight outliers and tells stories with their data. It uses a 3D render engine to display large graphs in real-time and to speed up the exploration.

 

Method to Create This Visualization

  1. Selecting the Dataset

Selecting the right dataset has always been a nightmare for me. As I struggled while finding the right dataset in my previous lab (Tableau), this lab was no less. With so much free data available, I was facing difficulty in narrowing down the dataset topic. After much effort, I came across a dataset which was related to the Marvel universe. The best part about this dataset was that it was a Gephi file and it didn’t require any cleanup. Now that I had the dataset/ Gephi file, it was time to proceed to the main step.

2. Data Visualization Using Gephi

In order to make a network visualization from my dataset, I started exploring Gephi. It was my first interaction with this software, so it took me a while to understand the whole interface. When I started the visualization process, I realized that the dataset was huge, and I need to cut down the dataset. This is where I applied the ‘Range (Degree)’ filter to it. Once the filter was applied, only .34% of the total nodes and .29% of the total edges were visible. After this, I moved to the layout selection, where I selected ‘ForceAtlas 2’. I then used ‘Expansion’ to increase the spacing between the clusters. I did try out the other layouts but, in my opinion, ForceAtlas 2 + Expansion produced the best output. I could clearly see four different clusters of my data scattered in a triangular shape. It was now time to separate each cluster by assigning different colors. So, I decided to run the Modularity function and then assigned colors to the Nodes based on Modularity Class.

For formatting, I moved to the ‘Preview’ tab where I changed the font style and font size. I also played with the thickness option under the ‘Edges’ menu.

After all this, my network visualization was ready, and it can be viewed under the ‘Findings’ section of this report.

 

Findings

The final visualization with the ForceAtlas 2 + Expansion layout represented data in four clusters. I could clearly identify 3 clusters – ‘Fantastic Four’ and ‘X-Men’ and characters affiliated to the ‘Avengers’. I think the Avengers group included characters from the movies and comic books.

I also used some filtering option for my visualization and the statistics were as follows –

Average Degree – 35.027

Modularity – 0.299

Range (Degree) Settings – From 924 to 2189

 

Visualization I created using Gephi

 

Reflections

Overall, I enjoyed exploring Gephi but I think it takes time to get used to it. After working on this project, I feel that Gephi restricts the user and does not give as much freedom as Tableau.

Though it is a very powerful tool to make amazing network visualizations, it comes with its drawbacks. The biggest drawback that I feel in this software is the inability to undo any action. It was difficult for me to experiment with the software without the undo button. In order to tackle this inability, I had to save a new version of that particular project after every change I made. As a user, it was very frustrating for me. Also, I think some small features like zoom in and zoom out also worked against my mental model.

Talking about my dataset, I feel that my dataset was very big for this software to process. So next time when I will use Gephi to make any visualization, I will make sure that my dataset is not that big. This will help in making better visualizations.

In all, Gephi has a lot of potential provided it is designed in the right way.