Brief introduction
By introducing one of the most protective and efficient transportation methods, cycling has dominantly helped major cities in the United States in the past decades, regarding the series of political interventions and moral justifications. Among the variety of personal and sharing bike companies and choices, Citi bike by Lyft draws huge attention not only for their contributions to the convenience of commute but also for developing the new chained city lifestyles with other services offered by the company. Throughout Citi bike’s development history, its annual expansion action in 2017 drags the fame of Citi bike to become the top of the bike-sharing program in the United States, which leads my investigations for this lab research project. According to its official website, by October 2017, Citi Bike has covered a total of 706 stations and over 12,000 bikes in use. This drastic increase in stations and share-bikes will hypothetically create an increase My concentrated questions are focused on how this booming in numbers affects their user’s behaviors in regard to their data report.
Finding the datasets and choosing visualizing tools
In order to find their reports, I have looked up the public datasets that directly lead to transportation and companies. Finding and tracking data is relatively easy because the Citi bike website informed the public that some of their gathered information is in the public domain, and in Awesome Public Datasets I was able to find their published datasets based on months and years. Their users’ trip histories are accessible and I am able to analyze data by sorting trip durations, start and end times, dates, station IDs, names, bike IDs, user types, gender, and year of birth. User types are described as Customer, meaning users with a 24-hour pass or 3-day pass user; or Subscriber, meaning they own an annual membership. In gender data they have 0 as unknown; 1 as male and 2 as female. They pre-claimed that the data we accessed “has been processed to remove trips that are taken by staff as they service and inspect the system”, and with “any trips that were below 60 seconds in length”. After I downloaded the data for 1 month from a single dataset, the capacity for analyzing the data was already exceeded the maximum capacity in Excel, meaning even if I processed and delete useless information, I will be processing with insufficient amount of all data. Fortunately, the existent data is relatively clean, so I just skipped the process of adjusting my data by using Openrefine. In visualizing results, I tried Tableau Public in two separate worksheets in reflection of the two huge datasets I am working on for this project.
Methodologies and process
As I mentioned, I am trying to engage with the length of action before and after to see how users’ behavior changes during this period of time. By allowing enough length of the adjustment and eliminating too long to drag in other issues, I choose two datasets, June and December 2017, to analyze in order to fulfill my studies. Even though they claimed they already delete some data, each of these datasets is containing over ten thousand user behaviors so they are heavy to load and analyze. I chose these two specific months because October 2017 is when they claimed to finish rendering this process and fully activated the station and bikes. Due to the unsure exact date of when they start to implement this act, I assume a 6-months duration is considered moderate for my research. Both worksheets covered the birth year distributions regarding an average trip duration and the sum of trip duration. The purpose is that the average can easily describe the flat amount of behaviors among the same age, whereas the sum can directly point out the maximum and minimum group of behaviors which the trends of distribution.
Visualizations and interpretation
The result refreshed my recognition of this program. Even though my data is collected based on the New York area, the change is still dramatic only in the half year. To my surprise, the amount of the peak of the sum of trip duration over all ages is dropping in December. The accumulative trips add up in June is 1,472,112,603 seconds whereas in December it drops to 680,822,229. More than half of the behaviors are reduced in the results of two years. However, in response to average duration and age distributions, it almost remains the same. Both peaks are contributed by the users born in 1987, and the range of active users is from 1970 to 1990. My conclusion from these two datasets disapproved my hypothesis, but at the same time proves their most loyal users are remaining in the same range and equally using the facilities throughout the year.
Reflection of this lab
The process of doing this lab is challenging because the transformation from datasets to visualizations still needs time for me to develop from a museum background. However, I realize I still learned a lot from the process of interacting with Tableau. Regarding my panic about handling two large datasets, I feel like there should be more details and functions for me to gid into when I get more familiar with the software. For instance, the two outliers in the average trip duration can be further investigated, and it should have a function I can use to combine my two graphs into one so that it is even more direct to visualize the two graphs. And I can do further investigations on the previous years in the same months so that I can eliminate the impact of the seasons.
Appendix
I joined this class a bit late so I am still gradually adapting to the class. Here are the links that lead you to my Tableau Public site with two of my dashboards.
Citi Bike data June 2017: https://public.tableau.com/views/CitiBikedataJune2017/1_1?:language=en-US&publish=yes&:display_count=n&:origin=viz_share_link
Citi Bike data December 2017: https://public.tableau.com/views/CitiBikeDataDecember2017/1_1?:language=en-US&publish=yes&:display_count=n&:origin=viz_share_link
Bibliography
Awesomedata. “Awesomedata/Awesome-Public-Datasets: A Topic-Centric List of HQ Open Datasets.” GitHub. Accessed February 21, 2023. https://github.com/awesomedata/awesome-public-datasets#complementary-collections.
BetaNYC. “Bike Share Data Systems.” GitHub. Accessed February 21, 2023. https://github.com/BetaNYC/Bike-Share-Data-Best-Practices/wiki/Bike-Share-Data-Systems.
“Citi Bike System Data: Citi Bike NYC.” Citi Bike: NYC’s Official Bike Sharing System. Citi Bike Official Website. Accessed February 21, 2023. https://citibikenyc.com/system-data.
“December 2017 Monthly Report – d21xlh2maitm24.Cloudfront.net.” Accessed February 21, 2023. https://d21xlh2maitm24.cloudfront.net/nyc/December-2017-Citi-Bike-Monthly-Report.pdf?mtime=20180216162543.
“June 2017 Monthly Report.” Accessed February 21, 2023. https://d21xlh2maitm24.cloudfront.net/nyc/June-2017-Citi-Bike-Monthly-Report.pdf?mtime=20170719094633.
Shapiro, Rachel. “Here’s Why It Costs $6k per Citi Bike Bicycle.” silive, April 22, 2017. https://www.silive.com/news/2017/04/heres_why_it_costs_6k_per_citi.html.