Airline connectivity in the united states


Networks
A network graph displaying 234 U.S. airports + airline flight paths that travel between them.

Introduction

As someone who has travelled quite frequently in the past, visualizations of air travel and flight patterns intrigue me. It is interesting how such visualizations can provide such a staunch bird’s eye view of how humans migrate and move. While exploring networks this week, I came across one of Gephi’s list of sample datasets that allowed me to try my hand at these niched visualizations.

The GraphML file came with a set of nodes representing airports as well as a set of edges representing flights paths between those hubs. In working with this network dataset, I was initially concerned with visualizing the most travelled flight paths within the contiguous United States. However, I also wanted to highlight the airports that had the most flight paths as they are major centers for local air travel with the U.S. I was able to determine these airports by evaluating the degree centrality (which I will touch upon a bit later).

Materials

Gephi is an open-source software for creating network graphs and came in handy when attempting to visualize the U.S. air travel network. It is helpful in displaying connections between nodes as well as organizing nodes into communities.

OpenRefine is a useful open-source tool for cleaning and organizing data. My data came in the form of a GraphML file, so after I was able to split the nodes and edges into separate .csv files, I used OpenRefine to organize the data into the relevant components for my analysis.

Adobe Photoshop is a raster graphics editor that came in handy when I turned my network graph into a neatly-designed graphic with a title, dataset source, and some additional context.

Methods + Results

As aforementioned, I began my process by downloading the GraphML dataset on airlines from Gephi’s list of sample datasets. I was unfamiliar with the GraphML file format. however, after some research, I learned that it is a XML that (in this context) contains a set of nodes and edges. I uploaded the file into Gephi and in the data laboratory tab, I exported the nodes and edges as separate .csv files that I would clean in OpenRefine.

The spreadsheet of nodes contained columns for IDs, labels, timestamps (which were all null) and the tooltip, which contained the airport codes as well as their latitude and longitude coordinates. I knew I wanted to display the airport codes as the label and that the latitude and longitude coordinates would be useful in constructing my graph into the shape of the contiguous U.S. Thus, I deleted the label and timestamp columns and broke the tooltip column into 3 columns: one for the label, which would be the airport code, another for the longitude or x-coordinate, and the last for the latitude or y-coordinate. The data in the spreadsheet of nodes looked fine so I left it alone. Next, I uploaded my clean files of nodes and edges into Gephi and began running statistical analysis.

Provided this written tutorial on Gephi and network analysis, I decided use the Ranking panel for nodes to determine degree centrality within the network. For the nodes that had the highest degrees, I made sure to increase the size of the node and the corresponding label to highlight those airports. Also, I used the Geo Layout plug-in to display the nodes in a manner that models how we often see and think of the United States, geographically. The Noverlap layout plug-in to avoid overlapping. Minneapolis, Detroit and Atlanta turned out have the highest degrees. The resulting visualization also shows heavy air traffic in the East in comparison to the Midwest.

Airline connectivity in the contiguous U.S.

Reflection

It was quite reqarding working with network visualizations and to work with this dataset in particular. I initially had trouble interpreting the data format and using Gephi, especially trying to find a way to transform the data in the node spreadsheet to be able to use the geo layout feature. After a bit more research and time spent with the software, I began to understand what I was doing and I got a good visualization out of it.