Introduction
My topic for the network lab is “Marvel hero”. Marvel Entertainment was founded in 1998 and is mainly known for its comics book by Marvel comics. The comics become so popular worldwide, even the business is expanding to films, clothing, and video games. Over the years, Marvel’s universe has become so large that keeping track of everything via conventional means is next to impossible.
However, I am not a huge fan of Marvel and I am always confused about people’s relationships in the movies. Therefore, I decided to visualize the network of Marvel’s heroes to learn the relationship among those characters.
Inspiration
I have searched many network visualization of Marvel heroes online. Among all of them, I found the work done by Kai Chang is very inspiring. His graphs were produced for the Data Insight 2011 competition at Adobe in San Francisco using data from Infochimps. The tool he used for creating his graphs is also Gephi.
The layout and design are incredible. And the color code is similar to the Marvel’s color.
Materials
The primary tool I have used to visualize the network is Gephi. Gephi is a great tool for data analysts and scientists ken to explore and understand graphs. It’s very powerful to visualize the network and help users to tease out the relationship among thousands of components. Gephi is like Photoshop but for graph data. Users can interact with the representation, manipulate the structure, shapes, and colors to reveal hidden patterns. Also, Gephi has a very intuitive interface to use and get to the tools quickly. Without much training or learning, it’s relatively easy to figure out the rules. It is a complementary tool to traditional statistics, as visual thinking with interactive interfaces is now recognized to facilitate reasoning.
The dataset is from the Gephi wiki which has many sample datasets in different formats and categories for users to practice. There are two datasets about the Marvel hero network. I chose the one which includes not only heroes in the movie but also heroes in the original comics.
Result
I started the process of creating this viz by formatting the data I obtained from Gephi wiki github. When I imported the data to Gephi, I realized that the title of each column was not what I want, so I refined the dataset in the OpenRefien to clean the data a little bit. The Label of data will represent the node on the network graph. The type represents the Edges between nodes. The third column, Type, indicates the direction of the relationship.
The first thing I did to visualize the data was going to the “overview” in Gephi in manipulating the layout. In terms of the layout, I played around with:
Force Altlas
Force Altlas 2.
Fruchterman Reingold
By this, the nodes separated and gave shape to the graph. Lines representing the edges between nodes begin to appear and stretch. To keep the nodes close enough I increased the gravity and set them up according to Modularity Class. I also prevented overlap in the settings to make the viz clearer.
My first attempt was using black as background to help the network stands out, and I tried the Fruchterman Reingold for the layout. However, I personally felt the light background would work better so I changed it back with light background.
I selected the main Viz with the Forced Atlas 2 Layout as it represented the data really well in three primary clusters. And the hierarchy of each connection between major characters was presented. From the above image, we can clearly see the relationship among heroes in the Marvel universe. The interesting thing I realized that in this dataset, Ironman is not in a group with Spiderman. So I think the dataset is based on the original version of Spiderman before he became one of the avengers.
Reflection
Overall, I like using Gephi as a data visualization tool. The visual outcome is great and impressive. It was fun exploring different aspects of it.
However, there are some loopholes that I realized are annoying while I used the software. Gephi does not have a “go back” option. Once I have done something wrong, there’s no way to take me back. Therefore, I have to redo everything from the beginning. This is not friendly to the new user. Also, I realized that the “preview” stops working when I imported the dataset via the “import” not via “import spreadsheet” under the “Data laboratory” tab.
Overall, I think Gephi provides a very great network effect on data visualization.