Visualizing Citi bikes


Final Projects

Introduction

In 2008, the New York City Department of Transportation conducted research on alternative forms of transportation and published a strategic plan. Surprisingly, according to statistics, over 50% of all automobile trips happened in the city were under 3 miles, within the distances of pleasant bicycle trips. In 2009, a bike-share program was first proposed by the New York City Department of City Planning. And in 2011, Alta Bicycle Share was selected to operate the program. After two years of preparation, the Citi Bike, which was named after its main sponsor, was eventually launched on May 27, 2013, with 332 stations and 6,000 bikes. In the last 6 years, Citi Bike has grown into one of the world’s most popular bike-share systems, with 777 stations and 13,000 bikes. 

In my previous lab reports Citi Bike Usage in April and Visualizing Citi Bike Data on Maps, I used the Citi Bike Trip History to analysis its usage of April 2019. In this final project, I would like to take a further step and divide the project into two parts: the overall operation analysis and the April operation analysis. To start with, I analyzed the operating data from June 2013, Citi Bike’s first full month of operation, to May 2019 and try to answer the following questions:

  1. How was the station network expanding over the years?
  2. How was the usage of bikes along the years?
  3. How did the users use the service in the past 6 years?

Then, I again focused on the data from April 2019, the most up-to-date monthly data with no public holiday, to tackle the following issues:

  1. How did the weather situation affect the service?
  2. Who were the users?
  3. Did different users use the service differently?

Materials

Tools:

  • Google Sheet: web-based spreadsheet program used to collect and create new datasets. 
  • Carto: open-sourced cloud computing platform for analyzing spatial data and create maps.
  • Tableau Public: free software used to create interactive charts and graphs.
  • Adobe Illustrator: vector graphics editor used to improve the visualization.
  • Adobe Photoshop: raster graphics editor used to make animation and posters.

Datasets:

For the overall operation analysis:

For the April operation analysis:

Design Process

The Overall Operation Analysis

With the Monthly Operating Reports in hand, I started to collect the total number of active stations, annual members, casual users, trips and milage of every single month using google sheets.

The first analysis I made was a line chart of the expansion of the network of stations using Tableau Public. This is because I needed to know whether it experienced several significant signs of progress over the past 6 years to illustrate how the service expanded spatially. Although the number of the station in operation changed every month, it would be too much information for the readers to go through all these tiny progress Citi Bike had made. Then, according to the analysis result, I downloaded the trip histories data of the 5 targeted months, which included the necessary geospatial information. However, I had a hard time visualizing this information, as the datasets were gigantic and exceed the storage limit of Carto, but neither google sheet nor OpenRefine could properly open and clean them up. I also tried to map it with Tableau Public but failed to connect different spreadsheets due to the same reason. Eventually, I was able to extract the needed information, the station name, latitude, and longitude, with Tableau Public, reduced the size of the dataset and analyzed it in Carto.

The second visualization demonstrated the changes in the total number of trips occurred each month. The line chart itself was very interesting and what I needed to do was add some milestones as annotations in the chart.

The third visualization was the month-to-month data of subscribers (annual member) and customs (day pass or 3-day pass holder). I started with a stacked bar chart, but this view did not seem to clearly show the difference in number between subscribers and customs, especially when the numbers got close, compared to the line chart. Therefore I changed the graph into a line chart, which emphasis the variation of the subsections rather than the total amount.

The April Operation Analysis

This part of the visualization was based on my works from the previous labs. In addition to the Citi Bike Trip History of April, which included the data of geospatial information of the start/end station, start/end time data, trip duration, users type, gender and age of each trip happened within the time period, I also collected the weather information such as sky condition and temperature and created a new dataset using google sheet.

Analysis from the previous lab

The first visualization is the daily usage in April along with the weather history. From the previous work, I noticed a pretty clear tendency that the subscribers tend to use the service on weekdays while the customer used it more during weekends. However, in some days the usage for both user types declined dramatically and I assumed it was due to the weather condition.

Analysis from the previous lab

The second visualization was the demographic information of the users. Rather than analyzing the users as a whole using pie chart and stacked bar chart, in this final project, I separated them into two catalogs, subscriber and custom, and analyzed their gender and age respectively using treemaps. For the gender analysis, I believe some of the users were just unwilling to reveal their genders, so I kept Unkown separately. For the age analysis, some extreme numbers, such as 156, existed in the table, which obviously was impossible. Therefore, even though Citi Bike allowed anyone older than 16 to use their service, I excluded all the records over 80 after a quick user study.

Analysis from the previous lab

The third visualization demonstrated the trip duration of different users. Since the free-riding time limits for subscribers and customers were different, I used descriptive text to explain the rules but still, it seems to be a bit confused for most readers. So I added 2 treemaps in addition to the stacked bar chart, hopefully, it can illustrate the difference more clearly.

The last visualization focused on the different renting behavior of the two different user groups. This analysis was developed from the same conclusion I mentioned earlier, but instead of showing the usage of subscriber and custom over the month, I chose to use days and hours in columns, providing not only whether it was weekday or weekend the user prefer to ride, but also in which time period them tend to ride. In addition, I also included heatmaps showing the activation of bikes in 4 specific time frames, midnight, the morning commute, lunch break, and the evening commute. Initially, it was a static visualization with 4 maps placed next to each other, however, I found the difference was a bit obscure unless they were overlapping, especially between the last three time frames, and the maps were a bit too small to see. Therefore, I decided to make an animation to better present the changes.

UX Study

I conducted my UX study after my first draft design. I observed the participants when they read the visualizations and asked them to think-aloud to find out whether they understand the contents. Then I presented the participants the goals of my project to see whether they think the questions I rose were answered. From the user tests, I received the following valuable comments:

  • Legends should be provided in every visualization
  • The color used for the same content should be consistent across the visualizations
  • The background color should be more transparent to avoid visual distraction
  • The descriptive texts were necessary as users might be unfamiliar with the topic
  • The vague descriptive texts might cause confusions

I refined my works according to the feedback. Legends were added on each figure, even if they were explained in the previous visualizations. The color was readjusted, to make sure there was no confusion. The colors which referred to the weather information was made more transparent, to avoid distracting the users. Some descriptive texts were added to explain the Citi Bike operation system, and some other descriptive texts were rephrased.

Final Visualizations and Findings

The Expanding Network of Station

In the past 6 years of operation, Citi Bike had experienced 3 major expansion, in Jul.-Sep. 2015, Jul.-Sep. 2016 and Aug.-Oct. 2017 respectively, with over 100 new stations added to the network each time. Other than that, the number of stations fluctuated in a reasonable range, possibly due to weather situation or technical issues. The first expansion happened in the third year of operation, assumably the market and users needed time to adapt to this new service. Afterward, the expansion came every year, which means the service was operating well. However, there was no significant growth in 2018. Geospatialwise, the first stations were placed in Lower Manhatten, Midtown, Brooklyn Downtown, and Williamsburg. Then the network started to expand outward, covering Upper Manhattan, Harlem, more areas in Brooklyn and Queens. In 2017, the last expansion mainly occurred in New Jersey, with new stations established in Jersey City, Hoboken and Journal Square. 

Trips Over 6 Years

The first thing I notice in this visualization is that seasons had a significant influence on the usage of Citi Bike. Every winter, the number declined significantly while the peak season is between May and August. The second finding is the usage of the first two years stayed the same, but in August 2015, the monthly record first excess its pass records and reach a peak in September. This might also be a result of the first network expansion that took place in Jul.-Sep. 2015. 

Who Rode Citi Bike

To use the service, users must sign up for the annual membership program to become a subscriber or purchase a 24 hours/3-day pass to become a customer. From the visualization, we can see that the number of subscribers reached its first peak in May 2014, the first full year of operation, and started to decline. Potentially, many of the subscribers purchased the membership out of curiosity rather than needs and canceled it after their one-year subscriptions ended. Then in August 2015, the number started to grow, again after its first expansion, and experienced steady growth since then. On the other hand, similar to the trip data, the number of customers was greatly affected by the season. Every year, the number reaches its peak in August or September and drops as the winter comes. However, we can still see growth. In July 2018 the number reached 131,846, which increased by over 50% compared to July 2017. What is more, in May 2019, the beginning of the peak season, the number of customers has already exceeded the number of subscribers for the first time since the second month of Citi Bike operation.

Citi Bike Trips in April

The number of trips is strongly associated with sky conditions. On 5, 15, 22, and 28, trip numbers decreased due to the rains. While on 3,16, 23 and 29, users rode more under the shining sun. In addition to that, the temperature also influenced users’ behaviors. On 4, 7, 14, and 25, users tend to rode less even though it was sunny due to the decreased temperature compare to the earlier days.

The Subscribers and Customers

In general, subscribers conducted the most trips of the service in April. What caught my attention was the huge gender gap in cycling with Citi Bike. Male subscribers were the main user group of Citi Bike, conducted nearly 65% of all trips while Only 20% of trips were completed by women subscribers. Agewise, for subscribers, users ranging from 20-49 took over 67% of trips while for customers, more trips were conducted by mid-age users.

Keep the Ride Free

Most users kept their trips under 45 mins, which were free for subscribers. However, the free-riding time for a customer was 30 mins. Therefore 99% of the subscribers enjoyed the service for free, while for customers, 24% of them needed to pay for the overtime fee. 

Who Rides When

The visualization clearly illustrated the different usage patterns of subscribers and customers. During weekdays, the “M” shape lines of the subscribers demonstrated the peak hours during morning and evening commute, which was exactly what I expected after analyzing the users’ age. The usage of customers stayed low throughout the day, with a few thousands of records of activation in the afternoon. On the weekends, the usage patterns were very similar, though still disparate in numbers. The animation on the right side provided geospatial information of the records. The more close to lower/Mid-town Manhattan and Downtown Brooklyn, the more records, even during the night.

Future Thoughts

Overall, with the available resources, I was able to answer the questions I rosed at the beginning of the project, Though more data, such as residence population or metro station locations, would be necessary to prove or further explain some of my conclusions. It would also be interesting to visualize some of the trip routes on a map, for example, the longest and shortest trips conducted by subscribers and customers respectively. In fact, I included this visualization in my proposal but later decided to exclude it as it would not utilize any of the tools we learned in the course. However, it might give us some clues about where the users went and why they used the service.

Another direction is to visualize data from the bike-share programs in other cities. By comparing them to the Citi Bike visualization, maybe we can find solutions to avoid users from paying excessive overtime-fee or promote female cyclists.

In addition to the potential contents, the current visualization also needs to be revised. I conducted a quick user study before I submitted it and from the feedback, I realized some colors I used were a bit difficult to see with the map background even for the users with normal vision, and I would definitely make changes if given more time.