UBER RIDES NETWORK


Networks, Visualization

Introduction

Nowadays, people maintain relationships with friends, neighbors are supplemented by new internet-based media. Uber as a new creation from technology in our life plays an important role in our daily network. I choose the dataset from Stan Tyan in Kaggle because it is a personal dataset of uber which displays his social network from 2015 to 2018. This real personal dataset includes 678 trips. I’m curious about how much information can I get from the user’s data, for example, frequent addresses or user’s habits.

Inspiration

My inspiration is from the example of Hypergraph of keyword clusters in bioethics articles (1970–2010) in-class slides. Dots have strong representations that the lines show the relationships with each variable. I like this example because it’s a good method to show the emphasis of data groups by several visual heritage. From this view, it will be easy for me to find a user’s frequently location, etc. This dataset has many variables, trip_status, trip_start time, pickup_long, etc. I think it is good for me to make my Gephi lab report.

Materials

The link of the dataset: https://www.kaggle.com/stantyan/uber-rides

The link of software: https://gephi.org/

Methods

As a dataset prepared for Gephi, it has several columns: source, target, type, weight, and label. My source is the car brands; the target is the drivers’ name; the type is undirected; node label is vehicle brands; edge label is the terminal location; weight is the whole trip distance.

The Second step is importing data into Gephi. The software lets me chose the ” Graph type” which is a way to link data. Links can be undirected or directed. If links in a network have different strengths, the graph is deemed to be valued. When nodes are connected by different types of links, it is a multigraph. I tried both directed and undirected options. The first graph is a directed setting. It represents data as a sphere. Under the partition bar, I represent blue as male drivers and pink as female drivers. Plus, there are two to five thick arrows are shown in the sphere. From this chart, I know that most uber drivers are male and there are several locations that the user goes very often.

Since force-directed layouts would represent both extremes as circles filled with nodes placed at the same distance, everything that departs from this disposition is an indicator of structure. Force-directed algorithms do not just project networks in space – they create a space that would not exist without them.

It’s very obvious that the location called Saint Petersburg Paradnaya Ulitsa might user’s home or workplace. Also, the user chooses UberX much more than UberBlack. However, from my perspective, the limitation of the sphere chart is the directed data type restricts the more possibility of relationships between different variables display through the data visualization.

Therefore, I also try the ‘undirected’ type. There are many algorithms to select under the layout. ForceAtlas2 is a recommended choice from the class. When I first use it, the data shows too tightly so that I chose another one Yifan Hu. I eventually found I can use ‘expand’ to control the visual representation when the data’s links and nods are too close. However, I already use Hu’s algorithm which is also functional.

Yifan Hu. It is a very fast algorithm with good quality on large graphs. It combines a force-directed model with a graph coarsening technique (multilevel algorithm) to reduce the complexity. The repulsive forces on one node from a cluster of distant nodes are approximated by a Barnes-Hut calculation, which treats them as one super-node. It stops automatically.

A common strategy is to apply some sort of filtering, which reduces networks to the most connected nodes or most dominant lines. The idea is to discern some sort of backbone to the overall structure. To apply the theory, I keep only the largest connected component. There are three data clusters in Gephi according to his dataset. I select the mean cluster to visualize.

Results and Interpretation

Nodes are entities of the real world: individuals, organizations, nations, and technical or logical instances that are connected by links. Reading nodes labels and qualification to make sense of why some groups of nodes are more closely connected than others. In Gephi, there are two kinds of labels shown in the chart: Node Label and Edge Label. From the node label, I found Hyundai Solaris and Volkswagen Polo cars appear most frequent among the 678 trips. The less frequent brand appearance is Ford Focus, Vladimir, Nissan Almera, and Kia Rio, etc. The edge labels represent different terminals for each trip. I set the line color as ‘parent’ so that the relationship between lines and dots are more clear.

‘Ulitsa Lornonsova’, ‘Paradnaya Ulitsa’, ‘Sverdlovskaya Naberezhnaya’ are the most frequent lines appear on the chart. The lines’ weight represents the time the driver takes for each trip. The thickest line is ‘Ulitsa Lornonsova’ so that this location the user comes a lot.

For rank orders, colors should be perceived as ordered. To encode quantitative information, color gradients are needed where the different levels appear equidistant. Graphs can be compared on the system level by their density, their degree, their transitivity and clustering, and the number of dense areas or positions. The first rank color is blue, the second is red, the third the yellow. I set the three colors because the dataset is very bunching. Different colors can help me find which variable is the strongest which is the weakest. Developed expressly for network drawing, it does not treat spatialization as an automated operation but offers a subtle control of visual variables. That’s a benefit of Gephi users manipulate color for analysis.

Reflection

It’s hard for me to use Gephi for the first time. In order to run it well, I even replaced my old MacBook. However, I found exploring data through this software is very interesting. There are many algorithms for users to choose, each of them can deliver a different result. For future usage, it’s better to get to know which kind of database is matchable. This dataset is a personal uber trip. A personal network in Moscow. Even the chart displays many kinds of variables, it still has some vague conclusions as results because the lines are in a mess.

However, from the view in the reading assignment: this is one of the reasons why network charts are increasingly popular as ways to explore complex subjects: their visual ambiguity mirrors some of the empirical ambiguity of the phenomena they represent.

Reference

https://gephi.org/tutorials/gephi-tutorial-layouts.pdf

https://en.wikipedia.org/wiki/Gephi

Kadushin – understanding social networks (CHS 1-4)

Krempel-network visualization

Microsoft Word – 2019-02-06 UnderstandingForceSpatialisation 07