Coauthorship in Network Science


Visualization
screenshot-1

Figure 1: Final Visualization

Introduction

Originally I wanted to create a social network visualization on my Facebook friend’s list and show the connections between me and my friends. It would have been interesting to see the clusters that would form and to think about why they formed this way. Unfortunately, Facebook no longer gives that data out. So instead I had planned on showing the connections between items bought together on Amazon. This data was not working well. Finally I’ve settled back with a type of social network visualization; this data shows the co-authorship in network science. The data was compiled by M. Newman in May 2006 and makes connections between scientists that have been working on network theory and experiments. What better network, than a network of network scientists?

Methodology

The tools I’ve used to make the following visualization and analysis are as follows. I’ve used GitHub, Inc. to retrieve the data on Co-authoriship in network science. This file was already a graph file which meant I didn’t need to do anything further to clean the data by using excel or OpenRefine. Thus, I was able to then open the file straight into the network visualization program, Gephi. This dataset was a pretty large one containing (undirectional) 1589 nodes and 2742 edges.

In Gephi, I ran the layout “Force Atlas 2”. This began to bring all the connected nodes together in a very large and tight cluster. My next steps were to run the next two layouts, “prevent overlap” and “dissuade hubs”. These two created a much more spaced out network, but still not the way I wanted it to look. I ran a few statistics which helped create more visual nodes such as “Average degree” which I was able to use to size my nodes so that the nodes with more connections would be larger, and thus express its prominence. I also ran the “modularity” statistic which colored groups of clusters that were most connected. This is why some of the major groups are colored, but the smaller ones or single entities are all left grey.

Finally I ran two more layouts, “Yifan Yu” which centered all the most connected clusters and put the least ones around in a circle bordering them. Then “Noverlap” which separated nodes from overlapping each other. At the end, I simply added labels, which I will show below zoomed into the largest cluster.

Figure 1.2: Labels

Figure 1.2: Labels

Discussion

Figure 2: Used as inspiration

As in figure 2, one of the first visualizations that I thought to use as inspiration was this social network. However, being that there are 1589 nodes, this would create too long of a line of scientists, and with almost double that in edges it would be very difficult to see anything and make any kind of educated conclusion from it.

Figure 3: Used as inspiration.

In Figure 3 we have a more likely candidate that I used as inspiration. This is a generic network visualization from Wikimedia which shows the most major nodes and clusters in the middle and other less significant clusters surrounding it. This example has about 4 shades of the same color which can make it difficult to differentiate any differences between the clusters, and with so many edges it becomes difficult to follow where they lead. having different colored clusters, with edges being the color of the node they originate from can help the eye to follow it to its destination.

Figure 4: Used as inspiration.

Finally, Linked Jazz (Figure 4) is a great example of social networking visualization. However, this is an interactive visualization and much more in depth than we were going for this project, as it contains pictures of the artists, and more details on them when clicked.

Future Direction

With a lot more skill in Gephi and more data I believe it is possible to take this visualization further. For one, photos of the scientists could be added to the nodes like in the Linked Jazz example. There could be less overlap with the labels somehow, probably by removing the labels for smaller nodes unless zoomed into. Outside of Gephi, it might be possible to create a more interactive visualization that allowed for more information to be displayed when nodes are clicked, such as only showing the connections of the authors, and potentially listing the titles of he works they’ve been in together.

Sources

Figure 2. Social Network Analysis and Visualization in ‘The Papers of Thomas Jefferson’. http://www.dh2012.uni-hamburg.de/conference/programme/abstracts/social-network-analysis-and-visualization-in-the-papers-of-thomas-jefferson.1.html

Figure 3. Social Network Analysis Visualization. https://commons.wikimedia.org/wiki/File:Social_Network_Analysis_Visualization.png

Figure 4. LinkedJazz. https://linkedjazz.org/tools/network-visualization-tool/

Data for Visualization. https://github.com/gephi/gephi/wiki/Datasets