In order to get an understanding of Gephi I tried out an airline dataset from Github made by an unknown person/organization. Why did I choose this? Honestly, there was no particular analytical reasoning behind it I have always been interested in planes, cars, and boats. Seeing from where a most airlines traveled from and to piqued my curiosity.
The question I asked was which locations were the source of most airline routes available in the dataset. What target location was most prevalent along airline routes? I asked these question in order to focus and filter my search.
This map was my inspiration to look deeper into the new dataset. Before running the dataset through Gephi I did some extra research into the most to and from airports in the US. According to the map Chicago, NYC, LA, Dallas, Phoenix, Charlotte, D.C, and Philadelphia were the busiest locations for travelers. This made sense to me as those locations are the largest cities in the U.S. I wanted to know if this map/locations would be similar to the dataset results produced on Gephi.
Gephi: Is an open source visualization software. It’s primarily used as a network visualization tool that can help researchers take hundreds of thousands nodes and display the connections between them. Researchers use this to organize and find the community likeness and connections leading from and to each node. It allows you to filter through multiple elements such as weight and degree(just to name a few).
Excel: Before Trying out the sample airline database I tried to create an edge list using excel from my former dataset revolving around covid stats and vaccinations. I tried creating node ids and labels, along with a manual edge sheet with sources, targets, and types. Unfortunately I wasn’t able to make it work due to my limited knowledge on Gephi and general confusion on its inner workings.
Step 1: Finding/Uploading Dataset to Gephi
Using the sample data set URLS provided by the professor I was able to come across this airlines.graphml file. However, what I was looking for was a CSV file showing me more raw data that I could play around with before running it through Gephi. Unfortunately, I could not find that for this dataset.
Step 2: Filtering
Initially there was about 235 nodes & 1200 edges visible on the graph. By filtering through the using Range(Degree) I was able to focus it down to about 191 nodes & 342 edges. Why did I filter through degree? It was because the legend on the left side of the screen showed that the many nodes had between 0-7 degrees. And the statistical analysis showed that the average degree was around 4 and weighted degree was about 6. I wanted to view nodes that were the most prevalent in the target group.
Step 3: Running statistical analysis
Step 4: Finding
The PDF file shows that targets 136, 50, & 200 had the most edges along with the higher rates of community likeness. These airports were in Minneapolis, Detroit, and Seattle. Due these findings I was led to believe that the dataset wasn’t presenting the busiest airports, but the airports where many people passed through to reach their destinations. However, I could be wrong in my interpretation of the data as I am still learning how to use and analyze with Gephi.
As this was my first time ever dealing with network visualizations I had tough time understanding how to interpret data on Gephi and how to use it efficiently. Most of my findings were from manipulating the filtering settings to see how the tool worked. Even some of terms were foreign to me such as degree and weight. I had to look them up and find their significance to the dataset and graph.
Something I look forward to learning is how to build my own Edge and node chart using excel. I feel that I was missing that basic understanding of these concepts which prevented me from conducting a super thorough analysis. Also, I look forward to using Gephi on my dataset around covid cases and vaccination. My plan is to pick the countries where vaccination rates are high and compare them to countries where vaccination rates are low. This is all to see the effectiveness of vaccines on different demographics of people living in different environments.