Introduction
The animated visualization of U.S. Air Traffic created by Aaron Koblin presented the bustle transportation network up in the sky. I was fascinated by the vivid illustration and surprised at the vastness and the complexity of America’s airspace system. According to Federal Aviation Administration, there were 44,000 flights every day and up to 16,100,000 flights handled by them yearly. Since there were plenty of airlines and airports co-operating in the network, I was curious about the connections and relationships between them. By using the dataset Infrastructure Networks — Airlines from Gephi Wiki, I was interested to find out:
- What’s the community structure of America’s airspace system?
- Which airports are the hubs that serve as the important connections for the others?
Inspiration
This data visualization of World’s Air Traffic Network looked at the quantity and connections of worldwide air transport, using nodes as airports, the size of the nodes indicating the number of routes, and color to show continents. It chose to present in the force-directed layout that allows readers to see the relationships beyond geographical location but also shows another visualization layout in line with the geographical map on the top right. Overall, I think this layout delivered the information clear enough with the color use, node size difference, and the categorical labels. However, I wonder if it’s better to show the visualization only in geographical layout since the clusters were mainly corresponding to the regions.
Methods & Process
Data Collection & Cleaning
From Gephi Wiki, I grabbed the dataset (Infrastructure Networks — Airlines) which was in GraphML format that could be opened directly in Gephi to do network analysis and visualization. After I opened the file, I checked the dataset and that the edge table had valid source, target, type, and weight attributes, yet in the node table, the label, longitude, and latitude of the airport were all in one column that couldn’t work for visualization in Gephi. Therefore, I exported the table and used OpenRefine to organize it, separating the data into three independent columns. After the adjustments, I imported the new dataset I made into Gephi and ready for the next step.
Data Analysis & Visualization
1. Choosing Visualization Layout: Force Atlas
First of all, I chose Force Atlas, one of the force-directed algorithms, to generate the graph (Figure 1) with repulsion strength 2000 for expansion so as to prevent overlap.
2. Running Statistics
Secondly, I ran data analysis using the statistics provided by Gephi with the following outcomes:
- Average degree: 11.038
Figure 2 shows that most of the airports had only little degrees, and there were only few airports enjoyed comparatively large number of connections with others.
- Network diameter: 4
The shortest distance between the two most distant nodes was 4, indicating that the cohesion level was not low in this network. Yet the graph of betweenness centrality distribution (figure 3) told us that there were only few airports served as hubs standing between and connecting the others.
- Graph Density: 0.047
The low density value revealed that the number of edges was far less than the possible maximal number of edges which pointed out that the network was rather sparse.
- Modularity with resolution (1.25): 0.375
The modularity was not considered high, meaning that there were still close connections between nodes in different modules. Figure 5 shows that there was a total of 4 communities, having the characteristic that the one with fewer nodes has a more close relationship within the module.
3. Designing Appearance
- Node size was based on betweenness centrality using a range of 10 min – 40 max which identified the significant hubs.
- Color of the graph was based on the modularity class to highlight the clusters.
- Label size was set to correspond to the node size.
- Edges turned from curved to straight lines; edges weight were rescaled; the thickness and opacity were also adjusted to make the graph more clean and clear.
With the adjustments of the nodes and edges above, I got the data visualization result of figure 5.
4. Making Geographical Layout
I used Geo Layout to show the network according to the geographical location.
Result & Reflection
From the final output (figure 6) above, we could see that there are 4 communities identified by different colors in the network, with one having the biggest hub on the upper midwest region, followed by one on the northeast, one on the southeast, and the smallest, also the most sparse one on the northwest side.
Also, from the node size we could recognize the significant airline hubs are MSP (Minneapolis−Saint Paul International Airport) on the upper midwest, DTW (Detroit Metropolitan Wayne County Airport)on the northeast, ATL (Hartsfield-Jackson Atlanta International Airport)on the southeast, and SEA (Seattle-Tacoma International Airport) on the northwest. While these airports were all effective at moving passengers, MSP had the highest betweenness centrality and ATL was within the biggest sub-network.
For future direction of this experiment, I think it would be interesting to have more detailed information about which airlines are flying, from which airport to another one (directed), and in which time span.
Materials
Softwares
OpenRefine: A open-source tool for data cleanup.
Gephi: A free open-source software for network analysis and visualization.
Datasets
Gephi Wiki: Infrastructure Networks — Airlines
References