Lab Reports, Networks, Visualization


In 2003, a group of scientists published a study on a community of dolphins living in a fjord off the coast of New Zealand called the Doubtful Sound. They observed the associations between the dolphins over the course of 7 years and concluded that the particular geographic conditions led to a unique forming of relationships within this community, not seen in other studied groups. There is a publicly available dataset from their research, which I used to do a network analysis. My goal for this analysis and visualization was to be able to visually represent both the inconsistent and consistent associations as described in the study.

Photo by KARSTEN SCHNEIDER for New Zealand Geographic:


I took a long peruse at the network visualizations aggregated on Visual Complexity and found many designs that I thought were great. Some favorites under the music category were Love will tear us apart again and Trace Encounters. I was drawn to the visualizations that resembled mist or space, but I believe those are generated from large datasets with many edges. Ultimately I was working with a relatively small dataset that would not look like these.


I retrieved the network analysis dataset I used for this visualization from a list of datasets provided by CASOS at Carnegie Mellon University. I used OpenRefine to parse the data from the XML file downloaded from CASOS. I used Excel to relabel columns and create 2 separate nodes and edges CSV files, for import into Gephi for visualization. I also referred to the journal article written by the scientists who created the dataset in order to inform my analysis.


There was quite a bit of trial and error throughout this process. I tried working with a few different datasets that did not end up working out. Ultimately I selected the network dataset on dolphin associations in the Doubtful Sound of New Zealand. As previously mentioned, I downloaded the dataset as an XML file and then imported it into OpenRefine, in order to parse it and export as a CSV file. I then used Excel to prepare 2 separate CSV files to import into Gephi. After a few import tries and then relabeling the columns of the nodes.csv as ID and label, I was finally able to generate the beginnings of a graph in Gephi.

The next phase was working in Gephi to refine the visualization. I used the ForceAtlas2 layout, selecting to prevent overlap, as initially the nodes were too clustered and not readable. I then adjusted the node size, colors, and labels by degree ranking.

My goal was to create a visualization that would illustrate the associations and interactions of the dolphins, as described in the article by Lusseau et al. The article states, “Observing the association pattern of individuals allows inference about the social organisation of animal populations (Whitehead 1995). From such studies bottlenose dolphin (Tursiops spp.) communities around the world have been described as fission–fusion societies (Connor et al. 2000). In a fission–fusion society individuals associate in small groups in which composition changes very dynamically several times per day (White 1992).” This statement was further illustrated in the article with the below chart:

It was my goal to have my visualization show the 3 groups of dolphins and the overlap between the groups, as the above chart clearly illustrates, but it did not really turn out that way:

First iteration of visualization with nodes ranked by degree.

After feedback from my peer review meeting, I tried to better visualize the 3 overlapping groups of dolphins by partitioning the nodes by modularity class:

Nodes partitioned by modularity class.


I think that the last version of the visualization is going in the right direction. If I had more time, I would probably try to work with brushing to highlight key ‘connector dolphins’. I found this process to be quite challenging, including using Gephi (which has no undo button!). I must admit, I’m not very satisfied with my results. If I were to continue with this, I would probably change topics and try working with a larger network analysis dataset (YouTube, Twitter, or Reddit).