The Marvel Universe, a fictional universe with a great number of characters that are all connected in some way. The universe has more than 10,000 characters. The focus of this visualization will be on phase 1 of the Marvel Universe(2008 – 2012). The goal is to find the most connected characters in this time period.
My inspiration for this graph is the article ‘Shakespearean tragedies visualized through character interactions’. The visualizations have a clear goal. They want to identify if characters are closely connected. They also want to see if there is a pattern in the structure.
Gephi An open-source software used to create and analyze network data visualization
Microsoft Excel To format the data
The dataset used was taken from http://www.casos.cs.cmu.edu/tools/datasets/internal/index.php#marvel
To understand the data, I opened it in Spreadsheets. The data was neatly organized. It was a directed network. (Fig 2)
I removed multiple columns with unnecessary details. The data included the following connections:
- Agent to agent
- Agent to location
- Agent to company
- Company to company
I only wanted Agent to agent connection so I removed rows with other types of connections. Since it was just agent to agent connection, I removed columns with data to identify the type of connection. After deleting columns with data about location and companies. I ended up with a very simple data(Fig 3).
I imported my data into Gephi. The first view on the network seems chaotic and I was not able to interpret anything. I realized I need to to add proper labels and need to need to change the layout.
After adjusting ranking, showing labels, adding color and layout, the network was looking much better. It was very easy to interpret the information and find out the most connected characters.
To visually improve the graph, I wanted to make the names of characters more readable. I tried removing edges but it was making it hard to understand the connection. I also tried ‘Tag Cloud’ present but it was adding ambiguity to the network. I removed the stroke around the nodes and change the colors to lighter shades.
Results & Interpretations
Some important stats from the visualization are:
- Average Degree: 5.891
- Diameter: 5
- Density: 0.109
- Modularity: 0.536
I was expecting similar results from the network. Tony Stark and Steve Rogers are the most influentials characters in the universe in all phases. I was not expecting Loki to be one of the most connected characters in phase 1. Overall it was a nice and simple network to show the connection between characters in the universe.
I spent half of the time understanding the tool and I was not able to work more extensively with the data. Once I got the good command on the tool, the lab time was over and I didn’t get a chance to work on it. The tool can be frustrating but I believe we can do amazing things with the networks. I am happy that I was able to create a nice meaningful graph. I achieved the goal by easily identifying the most connected characters.
I enjoyed working with Marvel’s data. For the future, I would spend more time on this graph and make it look more readable and visually appealing. Since I also have the data about the location and company of the agents. I want to explore the connections between locations and the agents. It will be interesting how many agents have been to one location.