Gephi Lab Report


Visualization

Ratirat Osiri (Rey)

 

SU-LIS-658-01: Gephi Lab Report

 

I have prepared a few datasets for Gephi lab, but unfortunately, Gephi couldn’t process the dataset I have which I assume that the dataset is too big for for its memory size. However, I still managed to have one dataset working, and I think it is the most interesting topic to me. The dataset I chose (and working on) is Marvel heroes and comics they made the appearances in. The dataset contains two main labels ‘Hero’ and ‘Comic’. Hero and Comic nodes are connected together which the relationship between them could be interpreted as who appears in certain title of comic, and which comic titles the particular characters appeared.

 

I’m pretty sure that everyone chose Force Atlas 2 for the main layout. I use I think all the data would be too overwhelmed if I choose to keep all of them on the graph, so I filtered the minimum degree at 90, which eradicated 90% of the nodes that appeared in the graph, majority of the data in dataset has very small degree anyway. When the main elements of this dataset are divided into just hero and comic, so I chose to just focus on two main elements by using just two simple opposite colors and use sizing to exhibits their degree. By using size, I personally think that it give the sense of larger number more than the shades of color.

 

This dataset is really colossal with all the small details of hero and comic nodes that make a graph looks like the Milky Way. The comic label in the dataset has tons of series, title and issue so I decided to wrapped the small scattered comic nodes into groups. For example, one comic serie tends to has multiple titles, the Avengers serie could be ranged from The Avengers, New Avengers, Avengers Origin and such, so I grouped them all together under one ‘The Avengers’ label. I did this to every single serie in the dataset by merging the nodes together. I also had to duplicate the label column and apply color with the new column (which I named ‘type’). The reason is, the dataset separates hero and comic in label section, while the title/name are identified in Id section which I couldn’t edit. Duplicating column allowed me to edit the name of each node and make the name displayed properly in the graph. I’m not sure if it is a glitch or the wrong settings that I wasn’t aware of, but the label did a double space and some letters just have fader color. The final graph and final render are shown in figure 1 and 2. If we open the graph in Gephi and hover over the circle, it will highlight the other circles connected with that node/circle. In this case, if I hover over any comic title nodes it would highlight all the heroes who appeared in that comic and the other way around when I hover over hero node and it will highlight all the comics/series that character made the appearance(s) in as shown in figure 3 and 4.

 

Figure 1

catssdfsd

Figure 2

Untitled2

 

Figure 3

 

catsadasdsa

 

Figure 4

catswqweqwe

 

 

I wasn’t sure if I really needed plugin for my work. I have tried some of them but I eventually decided to use just a simple function in Gephi. In the preview setting tab, I adjust the radius option to fit the whole image of circles grouping together in big circle shape. I might missed some plugin with this feature, but I wish I could ‘decorate’ the nodes. I have funny idea about decorate each hero nodes in their signature colors on the circle and the its outline (i.e. Hulk is green/purple, Ironman is red/yellow). However, I’m not sure if it could make the graph looks more messy than it already is.  

 

I exchanged my graph with one of my classmate, which I was really surprised to know that she was working with the same dataset with me. She took a different approach from me, focusing on one single character (Captain America) and his connection with the comics and other characters, which I found really interesting. I thought about the other possible focuses and forms that I could work when I still had time to, but I eventually decided that the main focus of this dataset is the connection between characters and the comic series/titles they make appearance and that make this dataset so fascinated to me.

 

To be honest, I don’t really like working with Gephi that much, due to the lack of more in-depth   functions, inconvenient maneuver and buggy operation, not to mention that we cannot ‘undo’ any change. I and my classmate agree that tableau is much more enjoyable to work with, even tough I understand that comparing the two program is pretty unfair to Gephi which is an open source while Tableau is a very expensive program. However, I think Gephi is still a really good program to displays the connection between data within the data network which would be impossible to picture them in our head  unless it get visualized.