Making U.S. Flight Paths ‘Real’


Lab Reports, Networks, Visualization

To look up into the sky is to wonder just how far we can go. To look at a map is to wonder just how many options there are. For a country as large as the United States of America, and within the context of air travel, the answers are, respectively: quite far and quite a few. Even if one limits the number of airlines.

The visualization of these questions allows us to begin examining just how much is going on overhead; not only how active the coasts are, but also how connected the parts of the nation sometimes derisively referred to as ‘flyover country’ really are.

This lab was an attempt to explore those connections through network visualization and, as with air travel, it took me some interesting places.

Existing Visualizations

Entering into this lab, with limited knowledge of network analysis and visualization (certainly compared to other topics in the course), I actually started by thinking of the visual networks that had struck me most to date: the scandal-tracing drawings of Mark Lombardi — whose archive of drawings has apparently been digitized, with interactive value added.

George W. Bush, Harken Energy and Jackson Stephens c. 1979-90, 5th Version (1999) by Mark Lombardi

I find Lombardi’s drawings remarkably compelling in how they suggest the intricacy of geopolitical relationships in their structure; yet their neutral tones evoke the (idealized) factuality and integrity of newspaper journalism. I thought there might be a way of using the strict horizontals of Lombardi’s style to suggest states or latitudes and then use color to create a new classification of region, as suggested by a separate, relatively standard yet colorful piece I had found.

Shortly after beginning the lab and working with the software, it became obvious that these would not actually shape the final visualization. More on that shortly.

On a related note: not long after completing the lab, I became aware of this striking project by Martin Grandjean, which weaves together the geographic and the colorful, and which touches upon some of what I was thinking and some of what I found.

Unfortunately, since it entered the scene after the lab was completed, this example is going to have to inspire future exploration — but is still worth noting. What happened during the lab sits between these two sets of examples.

Materials and Resources

  • The dataset is one of the publicly available ones developed at CASOS. (“Good Flightpaths” being differentiated from “Not Good Flightpaths” as the one that would load properly into the ORA GIS Visualizer.) It should be noted that this only includes the data for three airlines: United, Delta and American Eagle.
  • Excel was used to examine the data set and separate out node and edge content, before cleaning the edge and node tables up in OpenRefine.
  • Exploratory and final visualizations were created using Gephi, which was also used for certain data cleaning and combining.

Methods and Process

In preparing the data for this lab, I first broke up the original data set from CASOS into an edge table of directed flightpaths and node table of airports with a variety of information, including city, state and, most importantly, latitude and longitude.

Those last two items I had kept ‘just in case,’ but it proved the right move, as it turns out that the Geo Layout plugin would allow me to plot all the airport location quickly and simply — and immediately pointed out some issues with my tables when it plotted a South Carolina airport just east of Iceland. The insight and context provided by the familiarity of the this shape cannot be overstated, and it naturally took over where the visualization went from there, leaving behind my original intentions.

With degree, diameter and density run, and the many edges of the network established (and somewhat overwhelming), I set about making this a little more user friendly. I added labels (which admittedly requires one knowing the airport codes — but the geography helps), resized the nodes and added color, paying particular attention to the opacity of the edges, which were obscuring one another in early versions.

Note: As an exploration, I did try to group flights by states — which, as might be expected, increased the density of the network considerably (from 4% to 20%) and in the process added a truly artificial and ultimately arbitrary classification. I do think some future benefit might be found by grouping airports that are generally frequented by the same travelers (e.g. Laguardia, New York and Newark), but that would require a whole other dataset and goal. Similarly, clustering by regions might prove insightful, but would require a very distinct question being asked — or a separate classification of travelers or origins (e.g. ‘Southerners,’ ‘blue states’) being proposed — before those clusters are made.

The Result (and a minor tweak)

The result of that process provides a clear view from space of the density of air traffic for this particular data set. The abstraction of the edge flight paths has been made more concrete.

Almost there

The most traveled regions stand out, and we certainly get a sense of scope and how busy the sky is, even with just three airlines. Zooming in allowed for closer exploration of the static image, allowing one to identify some of the smaller nodes and lighter connections — though it also shows the issues that arise with some of the more tightly packed airports (a point for further refinement).

Detail (Mercator)

At the macro- and mid-level, this all seemed to resolve pretty quickly, but something still struck me about the visualization being ‘off.’ I began looking at options.

Geo Layout allows one to select (amongst others) a Winkel Tripel projection as a model. After reading a little about this projection — which attempts to minimize the collective distortion of area, direction and distance — and observing its benefits in telling the story of the network, I decided this was the better form for the final visualization.

Winkel Tripel projection

Here is a side-by-side comparison of the two:

We now have more of a sense of the curvature of the globe, which from a narrative and aspirational standpoint emphasizes more the scope and scale of the network. Note how the increased sense of perspective emphasizes the dome-like network of flights.

With all due respect to those excluded, and understanding the limitations of the format, here is a detail containing just the contiguous 48 states.

Winkel Tripel (detail), Contiguous 48

I suspect the Mercator is actually more functional in terms of picking out details, but also find the Winkel Tripel to be both more evocative and more ‘real.’ It is also a very far cry from the original flattened graphics that I was considering when I started.

Reflections

I already wrote to some degree in the ‘Existing Visualizations’ section about where this investigation might go in the future or what I would consider if I were doing this again, and I would like to give those due consideration. I believe there is a way of analyzing and presenting this data that mimics Lombardi’s drawings — though with color — perhaps inspired by the Lt. Serjev / Marey-Ibry / Tufte schedules as well. But that is an in-depth study that requires much more consideration than this lab.

That said, looking at what was generated to date, I can’t help but focus on how much of it really is ‘just’ evocative or narrative in nature. It is a good end product, for example, for an annual report, wherein one might want visualization to suggest range or scope or reach, but where one must also have the underlying truth of data in the service of a chosen message.

A radial diagram, or something like the Lombardi approach mentioned twice above, could do the same thing, but to truly make this a functional tool — at least for a broad audience — I believe it has to be interactive. This would cut past the ‘language barrier’ of the airport codes and address the issues of label overlap that show up in denser areas. Hover labels that list the number of incoming and outbound (or in-degree and out-degree), not to mention the city and state name, would take full advantage of the source data and tell an even richer story.

Of the labs so far, this one led to the most speculation about possibilities — no doubt because the concepts were so foreign. It has generated a great deal of thought about what network visualization can and cannot do and left me thinking about where it can go next.