Star Wars Location Link Visualization
The record of getting to different planets in Star Wars
November 9, 2017
Xiaxin Chen
Dr.Sula
LIS-658-01 Gephi Lab
Introduction
Lab three, mainly to learn the Gephi and use the Gephi to build some visualization of node and link map based on the data we choose. I found the data which is also for test using the visualization tools to create the link node map, originally. This data is trying to show the relationship between the characters and the locations in the Star Wars. I got the data from the website casos(http://www.casos.cs.cmu.edu/tools/datasets/internal/index.php#starwars). Hope to use this data to try to understand how the Gephi work and build beautiful and meaningful data visualization. And according to the data to have a better understanding for the Star Wars series for fun.
Three Inspired Visualization
The first example as shown as the Fig.0. This is a node-link map mainly to discuss the patent collaboration at Apple, each a circle represents a person who offer the idea and the patent, and the size of the circle is decided by the weight of the person, the more the person offer, the bigger size he will get. In addition, the opacity of the circle is also decided by the weight, the more the deeper, and only Steve Jobs with a special color, I guess the reason is simple, because we all know that the Steve Jobs has such a unique position in the Apple, so only point out him with the different color is reasonable. While, what to my surprise is that Steve Jobs does not have the biggest circle here, and this visualization also show that indeed the achievement of Apple is truly comes from the tons of bit efforts. The Fig.0 is also a good example to show that link-node map could handle the very complex data set. Also, I assume that the author set the larger circle closer to the center by purpose. With this kind of design viewers could figure out where and who is the most important part. Imaging if the viewer hopes to get the same information from the list of data, it may take a whole day from the viewer. Meanwhile, since the data set is so complex themselves and it is hard to avoid that this kind of visualization need more space to exhibition themselves. Like the Fig.0, this image is hard to read on your smartphone, and even on your pc, you need to full screen it to see the details.
(Fig.0)
Then, another example is shown as Fig.1, this is a poster with the link-node visualization mainly to discuss which kind of linguistic expression has been used most frequently when people talking about aging. I think this is a great example to show the value of the link-node map in the visualization. And this is also a good example to show that we could utilize shapes and symbols to compress the information in the link-node map.
(Fig.1)
As Fig.2 showing, unlike other link-node map examples, this visualization use symbols with color coded to show the data source. The file symbol here represents where is the data from, the red file icon means the linguistic data here come from some third sector documents, the green one means they find it through the government documents. Moreover, there is also the blue represents the research document and the yellow symbol represent the blog document on the map. We could also find two kinds of circle here, the red one similar to the Target signals, which represents the main keywords people are talking about and the normal gray one represents the other keywords that people mentioned. The size of those circles represents the frequency that people mention the word. And there are only two edges here, the edge between main keywords and documents and the edge between other keywords and documents. The other use two totally different icons to separate them greatly rise the readability of this visualization, readers could quickly find out the document from the government is the center of the conversation. Only research document creates the isolated island which in my opinion, means that the public seems prefer to listen to the voice from the government, researches are kind of far away from there and formed its own circle.
(Fig.2)
What’s more, this visualization also intuitively presented to the viewers what kind of the keywords and source type has been considered as the marginalized edge of the topic. Generally, this is a great example to show that link-node map could powerfully handle the complex data.
The third example looks simpler, as Fig.3 showing, but I think it is a good example to show the node-link data could create the cluster of data to intuitively present the relationships between groups and groups, not only nodes and nodes. And the Fig.3 is the Flavor Network. This visualization is talking about the network between foods, the link here means that these kinds of food could be cooked together. The author use color to distribute the different categories of food, like spicy, meat, vegetables, oil, fruits and so on. The size of the circle represents the frequency of this food will be used in cooking. The more the bigger, but the author did not point out that which kind of cooking method they are referring, like the data is based on the Chinese food, Mexica food, Italian food or something.
(Fig.3)
Generally speaking, I choose those three as the inspired example for they show the ability of node-link network map to deal with complex data and show the relationships between nodes clearly. What’s more, each of them have their unique part, the first example could work for those data which need to emphasize the center part of the relationships, the second example use color coded symbols to create the better visual experience, and the third one could show how cluster works in the maps.
Materials and Methods
My data is about the Star Wars, the author of the data trying to record the relationships between some of the characters and locations. But the author did not point out what the relationships is there, like he links the Darth Vader and Rebel Blockade Runner but did not explain why make this link, Rebel Blockade Runner as a position there. I guess it may because Darth Vader once has been to the Rebel Blockade Runner. And this kind of link appears more than one times, it seems that once the character goes to a location, the author will mark and count it one time. And this becomes the weight of the edge at the end of the visualization.
Result
As the result, the visualization I build through the Gephi is as the Fig.4 showing.
(Fig.3)
The nodes in this visualization represent the character and the location. The color coded here is based on the type of the nodes. The orange pink nodes represent the location and the blue nodes represent the character. The size of the nodes is decided by the degree of the nodes, so we could figure out that a lot of the characters, locations are related to Death Star and Tatooine. What’s more, the edge there is also color coded, the blue edges mean that the source of these edge is the character, the orange one means the source of these edge is the location, and gray means that we are not sure the type of the source. And the width of the link means the times this relationship has been recorded, the more the wider.
Overall, the nodes here is limited, the data is around 500 rows, but since it tries to describe some nodes again and again, after cleaning the repeat one, there is only 24 nodes left there. And it is still unclear that how the author counts the number of relationships, like through the data, I found that the link, source from the Sandcrawler and target to Tatooine has the weight 24, means this kind of data appears 24 times in the data set, but how the author gets this idea is still unknown.
Future
This time is a good experience for me to use Geiphi to create the node-link network. But I think I should be careful to pick the data set in the next time. I should admit that after cleaning the repeat rows, there is only 24 nodes left is kind of limited. And based on this data, I did not try to create the cluster which is also a kind of pity for me. So, if in the future I have any chance to create a new node-link network through Geiphi, I will pick a more complex data and try to make some visualization with the cluster.