Network Analysis on domestic flights in the United States 2017.1


Networks
Large passenger plane

Introduction:

The United States is the world’s third or fourth-largest country by total area. U.S. airlines carried more passengers on domestic flights than international flights. Today, the numbers of flights and passengers keep rising due to cheaper tickets and travel plans.
For me, it’s interesting to see how people traveling around the country and the relationship between those popular airports. Network analysis can easily show the activeness and connections of airports — by measuring the flights fly between them.

Dataset:

The dataset I used in this project is from GitHub. Bharathiraja creates a dataset of on-time performance of flights. I trimmed the dataset into a more straightforward dataset, which only includes information of departure and arrival city in purpose to analyze these flights’ trend and picked January 2017 for this project.

Software:

To analyze and generate a visualization of the network data, I used the open-source software Gephi. It didn’t cost me lots of time cleaning up the dataset for Gephi since the dataset I used is already being organized. However, the size of the dataset and the capability of the machine stopped me from adding more detail to it.
Gephi is a user-friendly software that creates beautiful graphs for network data. It allows users to cluster, color, and label the data. It can be download for free on https://gephi.org/

Result & Reflection:

Below is a graph that contains the central area of the whole figure. To view the original image created by Grephi, click HERE.

By looking at the graph, audiences can easily find airports that fly the most flights. The software assigns the color cluster, indicating the airports in the same group have similar characteristics as a departure and arrival airport. The nodes (airports) in the same group are not geographically nearby each other; at least most of them are not. There are exceptions like LAX and SFO are partially overlayed on the graph, but there are both located in Califonia. This might be because the scale of traffic in Califonia is as large as it could, if not it has to, multiple airports to play the same role for the state.

To form a better understanding of the airport, see nodes’ information in this CSV file.

One thing I want to do in this project is to contain the information on the CSV file so that audiences without knowledge of the airports can also tell the relationship between the network analysis and locations. Combine the traffic data with this project is also a possible future work to create a broader understanding of the traffic of domestical aviation in the United States.


Network analysis and visualization are inspiring and engaging. By working on this project, I feel motivated with my works and want to learn from the dataset. While the network itself only tells the truth of the dataset, it engages me to think a lot of “why” and “how” questions about the dataset. I believe these questions can help me build ideas on projects, researches, and even thinking methods. I would want to run more data on Gephi and explore both the software and the world of data.