Visualizaing Data with Carto- Uber Pickups in NYC 2014


Lab Reports, Maps, Visualization
infoIntroduction

I’ve always wanted to create animated infographics when dealing with data visualization since I believe that’s a better way to do visual storytelling. But I never found any of the topics that I feel right about the story and the context to do so until I saw Project Manhattan Population Explorer.
The creator visualized the dataset into an interactive/animated visualization work which perfectly demonstrates the high and low of the overall Manhattan looks like in a day and expanded as a week. The project perfectly showcases what makes sense to make an animated data visualization in order to tell a story that associates with time. Thus, I’d also like to choose a dataset that tells the story based on time and location.
This lab is an experiment of visualizing a dataset that contains Geo data and date/time data, the expected outcome should be an animated data visualization work with meaningful and understandable information.

Tools and Dataset

Carto is a service that provides location intelligence service. The location intelligence can turn your location data into useful location-based analysis such as strategic site planning, streamlined city management, or optimized sales territories.
How Carto works is pretty simple, prepare the dataset that contains geo data and anything you could/would like to know, Carto would just project the data onto the map, so you can easily see the overview/patterns of the dataset. Also, Carto can even do more for you by aggregating the data. Thus, users would be able to identify the pattern or even do a prediction for certain topics.
The dataset I picked is “Uber Pickups in New York City” from April to September 2014. This dataset was requested from the NYC Taxi & Limousine Commission (TLC) by FiveThirtyEight that submitted a Freedom of Information Law request on July 20, 2015. By visualizing this dataset, I should be able to create a data visualization work to showcase a certain pattern that changes over time and associate with location.
According to FiveThirtyEight posted on Kaggle, this dataset roughly contains four groups of files:

A. Uber trip data from 2014 (April – September), separated by month, with detailed location information B. Uber trip data from 2015 (January – June), with less fine-grained location information
C. Non-Uber FHV (For-Hire Vehicle) trips. The trip information varies by company, but can include the day of the trip, time of the trip, pickup location, driver’s for-hire license number, and vehicle’s for-hire license number. D. Aggregate ride and vehicle statistics for all FHV companies (and, occasionally, for taxi companies)

Because of the platform restriction of Carto, I can only choose a smaller dataset, which I eventually picked “uber-raw-data-apr14.csv”, this dataset has the following columns to create the data visualization work:

A. Date/Time: The date and time of the Uber pickup
B. Lat: The latitude of the Uber pickup
C. Lon: The longitude of the Uber pickup
D. Base: The TLC base company code affiliated with the Uber pickup

Inspiration/Critique
Breathing City 2014

The Breathing CIty that done Joey Cherdarchuk is really inspiring even though his work was actually inspired by John Nelson’s breathing earth and Conveyal’s aggregate-disser post. Cherdarchuk wrote about his thought saying: “I wondered if I could make a breathing city. Manhattan looks somewhat lung-like, so it seemed natural. Should be a fun, quick project. How naive I was.” In order to create this animated visualization, he gathered the data of population, employment, land use, building footprints, and work-related activity percentages by the time of day.
My favorite part of this info viz work is the metaphor of “breathing”, this metaphor immediately turned the city into a “living creature” or a “human being”. By seeing the colors switch based on time frame, the audience can easily get the rough notion of how the population changes over time within 24 hours in Manhattan. The additional line chart on the top left corner also indicates the high and low of work/home population hourly. Overall the animated info viz work is visually pleasant and easy to understand the story at a glance. Concerns of this creation more lean toward if the data is readable when the audience wants to take a close-up look. Also, unable to pause in a certain time frame is also annoying.
Four years later after Cherdarchuk’s creation of the Breathing City, Justin Fung also created info viz for the similar topic. The difference is that now the audience can interact with the info viz work, and can even use the filter to view different areas in Manhattan separately, which I think it’s a refined version of the Breathing City. Oh and now it looks not like breathing, it’s more like the heart beating.

Methods/Process

As I’ve mentioned in the very beginning, I’d like to see data changes over time as animated info viz work. The goal here is to create a short gif that demonstrates when and where the pickups happen in New York City.

Pull Data and Observe

Such a pain in the ass when you try to upload your data to the tool and ready to do your data story creation, but the system just can’t do it because of the file size of your dataset. I’ve been encountered several error messages and been disabled certain analysis feature because of my gigantic dataset is hard to process.
When I first saw the rendered result of the data, all the sudden I understood why a small dataset is better, not only for better rendering performance, but also, the readability of the information is better when the dataset size is reasonable.

Try and Error with the Aggregation Feature

Because of the huge dataset, you can barely and space in between the data points on the map, unless you zoom in. Also, the whole dataset actually covers all 5 boroughs of NYC, which I feel it’s a bit visually overwhelmed, so I decided to use the feature aggregate and intersect data from different layers, which is the uber dataset and the NYC borough shapefile in this case.
I once successfully got the result by narrowing down the boroughs into one-Manhattan. But later when I tried to add more data criteria on the map, I eventually reach the analysis cache limitation of Carto. As a result of that restriction, I was not able to recreate this work anymore.

Adding Columns and Join the Tables

I was really happy about the animated info viz work since the data tells a simple but contextual story, which contains the time, location, and the incident. But I was not satisfied with the data didn’t really showcase the “time” criteria, so I’d like to mark day or night pickups.
In order to do that, I duplicated the dataset and add a new column that named day_night, then use the formula “=IF(MOD(A1+”05:00″,1)>.5,”Day”,”Night”)” to mark day or night from the existing Date/Time column in my dataset. It worked, but I’ll have to trim down that table to avoid another technical restriction of Carto.
Thanks for the help from my professor, I ended up with using a simple SQL to connect to tables together and add the new filed by applying this :
UPDATE table_name1
SET column1 = (
SELECT column2
FROM table_name2
WHERE table_name1.id = table_name2.id
);

Result

I can finally see the pickups mark as day or night with assigned colors. The yellow represents day and blue means night, and the animation loop as a month-long time frame. See the full work here.

Reflection

I really had a hard time using Carto, this is a tool I would not consider to use for a large dataset or complex project. But in the meantime, maybe Carto doesn’t design for a huge dataset, instead, I should learn how to grab the most essential data and focus on one simple thing to tell. In this case, I should just grab the pickup data for 24 hours a day, and seven days as the whole dataset to display since we just want to have an overall understanding of where and when the pickup happen in the city.