Exploring Gephi

For the second lab, my initial question pertained to how networks of people could be represented in a visual manner.

Initially, I thought of using my Facebook network to use as an example, or as an alternative, my LinkedIn network, to see how

people were connected.

In terms of questions, I was looking to see (visually) how people in my ‘networks’ are connected. Although I ultimately did

not wind up going in that direction, I shifted focus to a different subject, word adjacencies in David Copperfield. . The dataset

was focused on word adjacencies in the novel David Copperfield, not business (or friend) connections.

Visualization #1 (Figure 1, below) was what I was initially considering. I looked at my LinkedIn network primarily as a

matter of curiosity. The final output somewhat resembled this first pass at visualization if only in terms of items being linked

and clustered.

Figure 1, My LinkedIn Network, as visualized using Socilab (www.socilab.com/home)

Socilab did provide network analytics (Figure 2), showing absolute size, effective network size, as well as other measures, to

highlight areas where more connections can be made, or a network can be strengthened.

Figure 2, Socilab LinkedIn network analytics (partial)

Visualization #2 was more along the lines of what I was producing once I started working with Gephi datasets. This

graph came from networkrespository.com, a site with multiple data sets for exploration and analysis. It seemed to be closer in

line with the examples in the readings and in class, as it had more connections and stronger ties among the nodes.

Figure 3, Visualization #2, from www.networkrepository.com/web-pollblogs.php

Figure 4, Visualization #3, from https://flowingdata.com/2010/02/26/news-topics-as-social-network

Visualization #3 was the closest to the final product from the Gephi lab session. Again, though the focus of Figure 4

(Visualization #3) is different than the end result,it provided a representation of what I was looking to do.

To create the final product, I used a dataset from Github focusing on word adjacencies in the novel David Copperfield. I

settled on that dataset after trying out various other datasets from other sources (such as Networkrepository.com) that did not

work out. One set of data had what looked like properly formatted data (in CSV), but upon trying to import it into Gephi, the

file lacked the appropriate set up (nodes and edges), so it was not usable for visualization. I had tried locating alternative

sources in advance of the lab, but ran into issues such as incompatible file formats and dead links. In the interest of time and

efficiency, the readily-available and importable data worked best for creating a visualization.

To create the visualization, I used a dataset from Github, and imported it into Gephi. I had plans to look at other

datasets, but as mentioned above, they were not usable for this exercise. The set used seemed like it would present well and was

not overly large (smaller numbers of edges and nodes) or complex once visualized in Gephi (such as the airline data set). The

end result did look different than what was in the program workspace, but through making adjustments for layout, statistics,

and other factors, it created a usable product.

Working with the program had its challenges, though. The output would not show up in the program, making it

necessary to save the current version, export it, then open the result. Had that not happened, I may have been able to adjust

certain things such as color, with a bit more ease.

Using the Github dataset, Figure 5 (below) shows that the word “little” appears at the center of the ‘universe’ in the

visualization, and is by far the largest node. Other words also appear frequently (“other”, “old”, “good”), and were connected to

multiple other words throughout the diagram. Though it was not the dataset I intended to use, the end result succeeded in

visually presenting the networks concept.

Figure 5, Output from Gephi lab session (reduced size)

While I had a workable dataset and a final product, a brief discussion of ideas explored, but not used follows.

Facebook did not pan out because of changes made to the application program interface (API). Previously users could

map their own networks and their friends would appear as well; after the change in 2015, users would only be able to show their

own activity. For a user to show their friends on a network, they (the friends) would need to allow Facebook to access their

connections as well. I did find instructions for building such a visualization, but it seemed rather time-consuming, when other

datasets were readily available on other sources.

I was able to plot my LinkedIn network using Socilab (Figure 1). In terms of answering the “what would this look like”

question, it was useful, as I had a diagram to see and evaluate. The output had its limits, as it just reflected my connections, not

the connections of people to who I am linked. From the output, I saw that many of my connections are isolated. Additionally,

the clusters of people that appeared correlated to various parts of my life (work, college, other organizations of which I am or

was a part).

As to future directions, relationships between certain words in the novel could be explored and graphically represented.

The layout could be changed to emphasize or de-emphasize certain nodes, among other areas.

Having a somewhat better handle on Gephi now, more complex data sets could be explored and visualized. Also, having

a somewhat firmer understanding of how to work with formatting data, other visualizations can be created, more in line with

my interests, such as a network of characters from Absolutely Fabulous, for example.

Sources and Figures

Facebook API change, http://www.kdnuggets.com/2015/06/visualize-facebook-network.html

Figure 1, My LinkedIn Network, as visualized using Socilab (www.socilab.com/home)

Figure 2, Socilab LinkedIn network analytics (partial)

Figure 3, Visualization #2, from www.networkrepository.com/web-pollblogs.php

Figure 4, Visualization #3, from https://flowingdata.com/2010/02/26/news-topics-as-social-network

Figure 5, Output from Gephi lab session

https://github.com/gephi/gephi/wiki/Datasets

Information Visualization

Student work at the School of Information, Pratt Institute

Exploring Gephi