network graph of airline routes


Lab Reports, Networks

Data collection

During the time of my undergraduate I was working on the designing of airport as my thesis project. During this process I was really interested in the airline and their connectivity to other airport. before finalizing data i went through various number of types of data based on trees and social network. But I lacked in understanding these data sets. So when i was searching for data for this project I was really excited to see Airline related data set. I found my data set on Github Dataset. While starting with this data was difficult as well because no description was given about the data on the website. I had no clue how to go ahead with the data. In order for me to understand the data it needed a bit of reorganizing. For reorganizing of data I used Excel and Open Refine. I divided the tooltip column in Airport and their longitude and latitude. Also i changed the labels into the airport names.

Edge table

Inspiration

For the inspiration for this topic i googled the terms like ‘airline data visualization’. I got various images but this one is which grabbed my attention. The things that inspired me in this is the colour scheme and the geotagged nodes which made the map easy to understand. The colour clustering made the connectivity easier to understand.

Methodology

The method we used to portray networking is Gephi. Gephi is an open platform and leading visualization software. The data I collected from GitHub and transformed in open refine, I imported it in Gephi with the option “import data”. The software creates basic visualization in the overview tab. In this visualization the nodes are the airport and the edges is the connectivity. I set the sizes of nodes according to the density. The color of the nodes is set according to … . There are various options for the layout of the data. The most apt option for me was geo-tagging the nodes as I had longitudes and latitudes available for them. I downloaded the Geo tagged plugin for it. The preview setting gives me a final preview of the visualization. In this tab there are option to set the labeling, size of nodes and thickness of edge.

Result 

The network is geo-located and visualized using Geo-tag plugin. The color coding is done on the number of triangles. The information predicted from the visualization is this data seems like data about single airline. The nodes of the major hubs of the same airline. Tying the data to some research done by me, I am assuming this data is about american airlines as the number of gates assigned to this are more than any other airline at the major airport in Minneapolis.

In second visualization, the node are partitioned by colored by modularity class. This tells us which major and minor airport are connected.

The third visualization I did was a hyper-graph. In this I transformed the node and edge table with respect to modularity count. The source and id in the edge table are changed with the help of Vloop formula. This process clustered the major airports with their connecting minor airports which gave us an idea that what area is this particular airline connecting the most.

Reflection

The data set that I had chosen I could completely understand it after visualizing data. Because there was no description provided by the website about the data it was hard to interpret it. The overall process of using Gephi was very helpful for me to understand it completely. Still there are some assumptions made about what is the data exactly about.The only thing lacking in this process of this data is, I could not use the map of america as a background to make the understanding of the data easier to the user.

But in future i would really like to make a data set myself and visualize it using Gephi. This data set will have more sets of nodes and edges to play with. It would interesting how more complex data can be interpreted.