Citi Bike usage in April


Charts & Graphs, Lab Reports

Introduction

In the past 5 years, China has witnessed the rise and fall of its many dockless bike-sharing companies. Targeting on resolving the “First Km/Last Km” Transit problem, millions of bicycles were poured into China’s streets and soon became a hazard. Today, most of the bikes are stored and piled at bicycle graveyards across the country as the bubble burst. Meanwhile, in New York City, Citi Bike, which was launched 6 years ago, has experienced steady growth. With 12,000 bikes and 750 stations covering 60 neighborhoods, Citi Bike has become an essential part of NYC’s public transportation system.

Therefore I decided to take this Tableau lab as a chance to gain a further understanding of the operating model of Citi Bike, using the available data to analyze who the users are and how they are using it.

Process

Be inspired

I started by searching on Pinterest for visualization projects covering the same topic, from which I was able to learn what and how others visualize data from a bike-sharing system.

The Hubway Station Connectivity Matrix made by Zia Sobhani illustrated a breakdown of the station to station traffic for the first 15 months since Hubway launched. What I like about the visualization is the bright colors really stand out from the background and anyone can get the big picture of the usage difference throughout the hours and months in a glance. However, without any marks or annotations on the heatmap, the content of the matrix is a bit hard to understand.

The Hubway Seeking Metro-Boston by Eurry Kim and Kaz Sakamoto is more comprehensive, with analyses of users, stations and trips. It gave me some ideas of what I could include in my visualization. In addition to that, the colors of different variables are consistent across the illustration, which is important but sometimes easy to be neglected.

Finding and choosing Dataset

Citi Bike Database

I started the searching process on NYC OpenData, from where I was redirected to the Citi Bike’s own database. Initially, I wanted to include data of at least 6 months in my visualization, to roughly illustrate how the weather might affect the usage. However, the dataset was much larger than I expected, with over 1,500,000 rows for any single month. Therefore I chose April for my analysis, as it was the most up to date dataset with no public holiday so it can reflect a regular usage pattern.

Visualizing Data

The 1,766,096 rows dataset exceeded the memory limit of OpenRefine. Fortunately, it was very well organized, and I was able to analyze it in Tableau Public without further cleaning. I started with the big picture, to see how the bikes are used during the 1-month period; when is the peak hours; whether there is a usage pattern for weekdays and weekends.

Then, I tried to divide the users into different categories using users’ age, gender, type, and riding-time,  combined the categories with other data to find out how they use the service. Eventually, the user type category made the most sense to me, as one had to be a subscriber (Annual member) or customer (24-hour pass or 3-day pass holder) to ride a Citi Bike. With several charts and graphs on hand, I created a dashboard for the final result by simply dragging the desired sheets into it, as well as adding descriptions, marks, and subtitles.

Result

The final dashboard of my research includes 6 charts, the heatmap on the top left corner illustrated the overall usage of Citi Bike throughout the month. Then I broke down the users into two groups – subscriber and customers, and analyze their age, riding duration, what day and what time they tend to ride.

Final Dashboard

Main Findings

1. Distinct usage patterns for weekdays and weekends

Fig 1. All trips in April

The usage of the service appears to be quite predictable when splitting the data by day and hour. It is obvious most trips are taken during the rush hours, 8 AM and 5-6 PM during weekdays. While at weekends, users ride a lot in the afternoon, between 12 PM to 6 PM. Some exceptions appear at the bottom of the chart, and since there was no public holiday in April assumedly they could be the result of bad weather, although I need to add additional data to confirm the hypothesis.

2. Subscribers ride in weekdays, customers ride in weekends

Fig 2. Trips taken by subscribers and customers in April
Fig 3. Trips count in 24 hours

By dividing the user into subscribers and customers, we can further analyze the usage pattern. Both charts correspond with fig 1 in a way. In Fig 2 we can see that the usage of subscribers is more dominating in the weekdays, but the number drops quite dramatically during weekends. On the contrary, customers prefer to ride on the weekends compare to weekdays. Combining the information with Fig 1, we might assume that most trips conducted by subscribers happen on weekdays during the rush hours. While customers contributed most records during the weekend afternoons. And fig 3 confirmed that my assumption should be correct.

3. Most users ride free, but the extra fee also rack up quickly

Customers can enjoy their first 30 minutes riding for free. While for subscribers, the maximum free riding time is 45 minutes. Then, usage fees start to quickly accumulate after every additional 15 minutes of a trip. 99% of the subscribers’ trips are in fact free of charge, as they tend to be more familiar with the rules. As for customers, 76% of them successfully avoided paying extra fees. However, considering the overtime-fee is quite high ($4/15mins), and as it can be simply avoided by dock and unlock a new bike every 30 minutes, maybe Citi Bike can remind the customers via text message as they reach 30 minutes to improve their user experience.

Reflection

In general, I think Tableau Public is pretty easy to use for beginners. By simply drag and drop data, I made 27 charts and graphs in a few hours and eventually picked 6 of them for the final dashboard. And there are abundant tutorials online, allowing me to quickly solve any problem I met during my experiment.

For future direction, I would definitely add the weather condition in my dataset, to confirm the assumptions I had made earlier in the analyzes. Also, it would be interesting to compare the trips conducted by subscribers and customers during each day of a week in an hourly dimension, to further understand how different user groups use the service. I would also like to analyze the overtime fee paid by the customers, to see the average fee paid to Citi Bike for extension usage and the highest fee that occurred in April, so I might be able to provide further suggestions for improving the user experience. Last but not least, when the bikes at the top 10 stations were checked out and docked would be interesting to know, too. And I would get to know whether the bikes needed to be relocated manually by staff to ensure there were always be bikes available at the most popular stations.