Introduction
The topic that I chose for this Gephi lab is ‘Marvel Universe – Focusing on the Avengers’.
Marvel Entertainment has been in the business for over 70 years, churning out book after book, continuously developing its various characters and franchises, and even expanding to films, television shows, and video games. Over the years, Marvel’s universe has become so large that keeping track of everything via conventional means is next to impossible.
I picked up on a small part of the universe, ‘The Avengers’ and tried to visualize the network on Gephi in terms of characters, their locations and gadgets and technologies used by them.
Materials
The primary software used to create this visualization is Gephi. It is an open-source software for network visualization and analysis. It helps data analysts to intuitively reveal patterns and trends, highlight outliers and tells stories with their data. It uses a 3D render engine to display large graphs in real-time and to speed up the exploration.
According to Heymann on GitHub, Gephi is a tool for people that have to explore and understand graphs. Like Photoshop but for graphs, the user interacts with the representation, manipulate the structures, shapes and colors to reveal hidden properties. The goal is to help data analysts to make hypothesis, intuitively discover patterns, isolate structure singularities or faults during data sourcing. It is a complementary tool to traditional statistics, as visual thinking with interactive interfaces is now recognized to facilitate reasoning.
In terms of data, it is sourced from CASOS which addresses complex real world issues through a combined social-science & computer-science approach, using advanced techniques from network science, text-mining, and agent-based modeling. This dataset was created by Netanomics staff and is based on early marvel super hero comics & Avengers. Created June 2016.
Another tool that was used cleaning up & formatting data was Open Refine. Formerly Google Refine, it is a powerful tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data.
Inspiration
For my visualization I went through various networks based on Marvel. They all seemed to be extremely elaborate so I mostly looked at how they were compared and visualized. Data Column’s network analysis graph of actors in Marvel Comic Universe movies was something that caught my attention. It uses eigenvalue centrality to show the importance of connections within the network. The higher the centrality, the higher the influence of the node.
I also looked at bigger Marvel universe datasets and came across a data in sight hackathon whose Marvel visualization was awarded the most aesthetically pleasing viz.
My main idea was inspired by this particular Viz created by Peter Olson (VP of Marvel’s Web and Application Development). Here it shows which characters are the biggest connectors to other characters. The bigger the name, the better they are at linking characters together in the Marvel Universe Social Network.
Methods
I started the process of creating this viz by formatting the data I obtained from CASOS on Open Refine. Gephi requires a particular data format. It requires a spreadsheet with three columns for Source, Target and Type. The Source is the data that will represent the node or circles on the network graph. The Target represents the Edges or connections between nodes. The third column, Type, indicates the direction of the relationship. During reformatting my data on Open Refine which was a network file, I selected only the edges (for a cleaner process) and set them up according to columns. For this exercise, I entered Undirected for each record and saved my file as a comma-separated value (CSV) file.
Post that, I created a new project on Gephi and imported the CSV file. It was divided in three tabs. ‘Overview’, ‘Data Laboratory’ & ‘Preview’ and I navigated through them.
In terms of layout, I played around with two –
- Force Atlas 2
- Fruchterman Reingold
By this the nodes separated and gave shape to the graph. Lines representing the edges between nodes begin to appear and stretch. To keep the nodes close enough I increased the gravity and set them up according to Modularity Class. I also prevented overlap in the settings to make the viz clearer.
For formatting, I selected and sized my fonts accordingly, made the edge line straight because they seemed more apt with my Viz. I also selected an Avengers color palette and edited the colors according to visibility.
Results/Discussion
The visualization can be viewed here.
I selected the main Viz with the Forced Atlas 2 Layout as it represented the data really well in three primary clusters. And the hierarchy of each connection between major characters was presented. However, I also liked the Fruchterman Reingold layout as well and hence, I’m putting that up. It is also set up according to Modularity Class and shows the connection hierarchy but isn’t as impactful as the previous visualization in terms of determining the major clusters.
Further Directions
I really enjoyed working with Gephi overall. It was fun exploring different aspects of it.
The current visualization indicates that the characters appear together and the strength of the connection between characters and location and the gadgets but it is only limited to the Avengers of the Marvel universe. By using a more elaborate datasets on Marvel and including more clusters this project could get bigger. This the stronger/super clusters could also be visible and be compared with the weaker ones. I would also experiment more with color formatting include a color-blind friendly version of it.