Citibike review July 2013, 2014, 2015


Visualization

Introduction

I chose citibike data (http://www.citibikenyc.com/system-data) because citibikes are relatively new to NYC.  The blue citibike bicycles and hub racks that have popped up around midtown Manhattan and other boroughs seem both romantic and strange for New York.  Citibikes are here to stay and New Yorkers need to embrace them.

My first questions were of comparison. I knew other countries and cities employed shared bike services long before NYC. I wanted to know how we compared in terms of bike performance and safety. Not only is biking through traffic inherently dangerous for the biker, but any midtown pedestrian knows that bikers often ignore the rules of the road. Obviously the data suggests that the benefits out way the risks for many New Yorkers. I wanted to categorize the ‘typical’ rider.

 

Additional questions included: How long were the longest trips; what routes were the longest and the most travelled, the safest; which areas or hubs were the most used and which were the most popular in terms of neighborhood ambience.
Visualizations that informed or inspired my Tableau designs

I saw at least three visualizations that inspired me in terms of how you can present the data. By far, the video of bike usage “Experiments in Bicycle Flow Animation” set the bar very high. http://www.economist.com/blogs/graphicdetail/2012/09/visualising-bicycle-trips

I did not expect to come close to this type of visualization, but it presented what could be done if all your informational ducks were in a row and you had the skillset to make it happen. Even as I admired the beauty, though, I was hard pressed to explain the value of this presentation. Later, when I found I was unable to map any GIS data, I was sorely disappointed and lesson learned to analyze the interaction between data and tools available to mine the data.

The information graphic that inspired me to characterize typical riders and typical routes and duration, I found first in an article that discussed the first 100 days of Citibike, (http://untappedcities.com/2013/10/29/fun-maps-citi-bike-data-visualizations-first-100-days/) and then at the source: http://madebyfriends.co/citibike/

That title grabbed me because I wanted to recreate some of the visualizations by comparing data from the months of July in 2013, 2014 and 2015. Thinking this would be a good start, I became increasingly disappointed with the info-graphics as I went to the source and began critiquing what was actually presented. I was confused by the graphics and felt they did not give valuable interpretations of the data. Again, another great idea that did not fulfill the purpose of interpreting statistical data properly for the user.

The last visualization is the Australian bike accident interactive map. http://www.theguardian.com/lifeandstyle/datablog/ng-interactive/2014/oct/10/-sp-bike-accidents-mapped-five-years-of-cycling-crashes-in-melbourne

This was the template that I would like to produce showing parts of the city where citibike hubs are prevalent and how the cross-traffic between hubs, the neighborhoods and the demographics of the average rider contributes to the story of New York’s version of shared bike service. A mapping technique would be needed and as I found out as I began to download the dataset, I did not have the tools for mapping.

Materials, including software and datasets used

We used Tableau 9 and Tableau Public in the lab. The dataset was taken from http://www.citibikenyc.com/system-data. As the dataset was determined to be normalized, there was no playing around with the dataset to “clean it up.” I did, however, learn the importance of sorting in Excel ahead of downloading to Tableau. Data was downloaded as csv file and opened in Excel prior to manipulation in Tableau 9.

Downloaded citibike data for each month of July for the years 2013, 2014, and 2015. This gave a good sample of the citibike data since July 2013 was the first month of accumulated data and July 2015 was relatively recent.

I did not have a mapping tool to work with the longitude and latitude columns, nor did I match and map hub stations to their corresponding zip codes. Being able to do one of these choices would provide a good visual map with which to showcase statistics.

Methods used to create visualizations

Lots of playing around and false starts in Tableau 9 led to understanding how the translation from cvs to Excel and downloading to Tableau manipulated how I could view the data. Some sorting in columns in Excel prior to my first download into Tableau would have been beneficial.

Once I realized I was lost, I thought something apparently as simple as comparing male and female riders, turned out to be more complicated. The data categorized riders as either customers or subscribers, but within each category there was a significant portion of riders of unknown gender. Even after color-coding for gender (male-orange; female-green), the unknown portion of riders (blue) was great. It also appeared that female ridership was greater than male ridership, but there was no way to prove this given the large amount of unknown riders.

My only real quantifiable variable was trip duration. I basically had no way to measure actual distance and could rely only on trip duration to tell a story.

I began to ask questions such as which week had the most usage (as measured by duration) and which bike ID had the most usage, which station had the most traffic (measured by number of times that station was a start station plus the number of time is was a stops station); also the average trip duration from the station plus the average trip durations to the station would tell which stations were the popular hubs.

 

Results/Discussion

Sheet 8

https://public.tableau.com/views/Citibikedatareview2013-2015July/Sheet8?:embed=y&:display_count=yes&:showTabs=y

Gives the trip durations at hours of start and stop for gender. Unclear how many more women ride the citibike than men

Sheet 7

https://public.tableau.com/views/Citibikedatareview2013-2015July/Sheet7?:embed=y&:display_count=yes&:showTabs=y

Bike ID groups less than 2000 had substantially more time than bikes with higher IDs, but what does that signify?

Sheet 5 tried to isolate four NYC neighborhoods to investigate the average and mean trip durations.

https://public.tableau.com/views/Citibikedatareview2013-2015July/Sheet5?:embed=y&:display_count=yes&:showTabs=y

Sheet 4 seem to give the clearest information showing the medial duration across hubs. https://public.tableau.com/views/Citibikedatareview2013-2015July/Sheet4?:embed=y&:display_count=yes&:showTabs=y

 

Sheet 2 Doesn’t make sense to me in terms of gender.

https://public.tableau.com/views/Citibikedatareview2013-2015July/Sheet2?:embed=y&:display_count=yes&:showTabs=y

Sheet 1 is a consolidation of gender and trip duration. Unclear what interpretation it provides.

https://public.tableau.com/shared/7GKG9RFJH?:display_count=yes

 

In the end the only thing I founded for sure was the bar chart.

 

Discussion of future directions

In the future, the citibike date should be able to generate graphical presentations that tell (a) the easiest route and (b) what specifically to watch for. A comment section, possibly emailed to the cyclist (with their permission) could survey the cyclist’s experience along the route in multiple terms, such as traffic, environment, neighborhoods, stores, sales, and the like. Parking such as mini-hubs could grow from this information. If women are indeed, the prevalent ridership gender, which is suspect, then bikes can be reworked for smaller frames.

Trip durations, in seconds is important but we should be able to determine safe routes, re-routes and detours, construction and roadway blocks, events in the city are all pliable developments that can help cyclists travels. I think it’s definitely worthwhile to compare the busiest hubs with each other and with those less popular. I wonder what date the city has not included in the dataset.