Visualizing Good Flight Paths


Visualization

Introduction

Networks help to explain our complex world by presenting insights into the relationships of elements (people, events, bacteria) revealing extent of connections between them.  For the network visualization lab I set out to create a graph of an airline transportation system in order to understand its underlying structure.   I am curious to learn what attributes become evident in a graph and how easy is it to recognize new information.

Inspiration

Transportation systems are already “mapped” and provide a familiar, real world application on which to embark on network visualization. The following visualizations illustrate varying viewpoints in the mapping of air travel.

airports-world-network

Fig 1 Air transportation network visualization from Martin Grandjean

Based on Openflight.org data, Martin Grandjean’s map (Fig 1) focuses on the quantity and connections of world-wide air transport.  Continents are represented by colors, nodes represent airports and node size the number of routes.  The map reveals connectedness, ie: Latin American clusters are very connected to the U.S.; India is more connected to the Middle East than to Southeast Asia.  I appreciate the strong aesthetic quality used to convey voluminous and complex information.

 

 

 

Thanksgiving-Flight-Patterns-by-New-York-Times

Fig 2 Thanksgiving Flight Patterns from The New York Times

 

The New York Times included a visualization of Thanksgiving flight patterns (Fig 2) in a holiday blog post. Its creators mapped the difference in flight patterns for the holiday weekend versus the norm using Google Flights search data. The map clearly conveys discernible insights to a general audience.  Color provides direction from origin to destination and the thickness of the lines represents the change in volume.

 

US Transportation (United)

Fig 3 US Transportation Analysis by Matine Shaker for HedgeClone

  • Another interesting airport/flight path visualization comes from the HedgeClone blog (Fig 3).  Using data from Bureau of Transportation Statistics, its designer sought learn if air traffic in corresponding airports factored in delayed takeoffs. In this map, the thickness of edges is proportional to the number of trips in the route. Larger nodes represent the hubs in the network.  Here, I like the simplicity of mono-colored nodes and edges.

Materials

Gephi is an open source visualization platform for “networks, complex systems, dynamic and hierarchical graphs, layout and metrics”.  The flight path dataset, retrieved from CASOS Network Science Data, comprises multiple airline networks and their system flight paths. United’s network was used for the lab. It contains 81 nodes and 370 weighted, directed edges and is described as “good flight paths”.

Methods

The dataset was converted from xml to csv using TextWrangler and OpenRefine, then imported as undirected into Gephi application with valid source, target, and weight attributes.  I selected a force-directed layout algorithm (Force Atlas2) to generate the graph.  The initial layout was further refined by running an expansion algorithm and removing overlap to improve legibility (Fig 4).  Next, I opened the statistics panel to run metrics on the graph, Average Degree, Network Diameter, Graph Density, and Modularity.  The average degree distribution followed the expected Power Law distribution.  The Network Diameter generated betweeness and closeness centralities.

Fig 4 Tranformations

 

I next used some of the metrics to modify the appearance of the graph.  First I changed the node size based on closeness centrality using a range of 5 min – 25 max which identified the significant hubs.  I then added color to the graph first by closeness centrality then by degree with similar results.  Ultimately, I colored the graph based on the modularity class to highlight the clusters.  Once the nodes were labeled the graph needed to be rotated to approximate the hubs’ geographical location.  I ran the Preview alternating between curved and straight edges and labeled or unlabeled nodes.  

Results

The most significant features of the final graph (Fig 5) are the hubs in United’s network.  The average degree of 4.5 reflects the majority of the small nodes have a degree of 1 where the major hubs and mid sized hubs are between 25-63. The modularity metric identified three clusters likely reflecting frequency and proximity.  Adjusting the resolution down added a fourth insignificant cluster.  The graph density (5.7%) and diameter (3) as well as the high closeness and betweeness centralities, reflect the connectedness of a regional air transportation network.  Some of the small airports in this network are isolates but perhaps not given a broader network representation.  Getting out of Missoula, Montana takes some planning.  

Fig 5 Final Gephi Graph

Future Directions

The dataset adequately represented the relationships between airports and good flight paths for United’s network, clearly identifying the airline’s hubs.  The visualization could be more effective/interesting by including multiple airline networks (airlines share airports/some flight paths).  Also, Including a second visualization for the “not good flight paths” dataset would give context to the graph and make the attributes of good versus bad flight paths apparent.  Finally, to make the graph more dynamic, using the SigmaJS plugin to layer a geographical map (Fig 2) and add interactively if  web published.

The main challenge which occurred with Gephi during the lab was that it is not able to show what interactions have been done to a graph.  Ie: it was difficult to recall which attributes used for size and color, needing to start from beginning.

References:

https://flowingdata.com/

Storrick, Jon. Good flightpaths [dataset]. Available from: http://www.casos.cs.cmu.edu/tools/datasets/internal/index.php