Fall colors in new york city

Introduction

Autumn is one of the most beautiful seasons, and fall foliage shows the natural magic. Many people in NYC will plan a weekend to go out from the city to hike and see fall foliage around the Hudson River or state parks nearby around Autumn. However, it will certainly be a luxury to do so for people who are busy or don’t have the transportation to go out. Therefore I wanted to design an NYC trees map about fall colors to provide an alternative way to enjoy the fall foliage and explore different neighborhoods in the city.

I designed an interactive dashboard consisting of 6 parts of information to help users understand the topic, learn something from the visualizations, and be encouraged to visit the tree or the neighborhood in person.

Inspiration

Figure 1.1. NYC’s Street Trees map from NYC parks organization

Figure 1.2. NYC’s Street Trees map from NYC parks organization

Figure 1.3. NYC’s Street Trees map from NYC parks organization

I got inspiration from NYC’s Street Trees map(Figure 1.1, 1.2 & 1.3) created by the NYC parks organization. Not only I got inspired by it, but also we shared one of the same datasets. There are three layers of information in the map. The first one(Figure 1.1) is the quantity of trees in each borough when zooming out. And the second layer(Figure 1.2) is the quantity of trees in each neighborhood. Then, the last layer(Figure 1.3) is the tree distributions on streets using different colored dots in various sizes to represent species and the total quantity ranking in NYC. The third layer is very close to how I wanted to visualize my design of fall colors distribution with the data I had.

Figure 2. Fall Foliage Report from iloveny.com.

I also got inspiration from Fall Foliage Report visualization(Figure 2) created by iloveny.com. It’s an interactive map that evolves through predicted time of fall foliage peak for different area in New York State. In this map, you can see the changing color of fall foliage in three different segments from yellow, which means just changing to bright red, which means near peak, and dark red, which means past peak. Since it’s already December now, the map only stays at the past peak stage and when Spring comes back, it will gradually go back to green again. The time element on this visualization is the most significant contribution, which inspired me to seek something similar in my visualization.

Methodology

Gathering Data

I used three different datasets to for my visualizations. The first dataset is 2015 New York City Tree Census from Kaggle.com. The second dataset is New York City Tree Species also from the same source. The last dataset is just simply the boundaries map of New York City boroughs from NYC Open Data. These the first and third datasets are actually very common datasets to be used for visualizations, but I wasn’t worried about how that might limit my design because many people used the same datasets for similar expressions. However, one of the problems for tree census dataset is that it only includes New York City street tree, excluding the parks tree data. To achieve my expected visualization design, these datasets were not perfect and I need to do a lot of work on them.

Challenges

There are three major challenges for gathering data.

Ideally, I would like to have a time element on my visualization showing the changes of fall color based on the date of the month estimated. However, I couldn’t find this sort of dataset, which I assumed there are two reasons. The time/date documented for tree leaves color change is nearly impossible, and even if it does exist, it might not be accurate from location to location. There are 9 climate zones in total in the US among 49 states(including Alaska and excluding Hawaii). The same type of tree has different growing cycles among different climate zones. Even if the same trees are growing in the same zone, there are still a lot of factors that could affect the growth cycles, making the time data for fall color impossible. The second reason is that it does not bring much value for botanists or census organizations to collect tree leaves color data from unpredictable times from each exact tree. Like the second inspiration, most of the fall color data is all about the whole area with the prediction of the overall temperature and tree growing cycle affected by that temperature in general at that area.

The second challenge is that the tree species data contains the fall color data, but it doesn’t cover all trees in New York City. In this case, more than 20% of the trees will remain null, bringing an incomplete story to the users. Therefore, I needed to figure out how to avoid this situation.

The last challenge for the datasets is to clean the 2015 New York City Tree Census. This dataset contains 41 columns and over 800k rows of data, so the file is massive. However, I wanted to use only 8-10 columns, including tree id, status & health, species name, location information, and spatial data. The best way to clean the data is to delete the extra columns first. However, the dataset is too big for any data cleaning tool to open and function efficiently. Open Refine is one of my favorite tools to clean data, and I couldn’t open this dataset. Neither did the basic Excel tool. Eventually, I figured using Microsoft Access to open the file and successfully delete some extra columns. However, the data exported is somehow problematic due to issues that I couldn’t figure out why. Therefore, eventually, I had to also figure out another way to utilize this dataset or switch to something else.

Organizing & Cleaning Up Data

Like I mentioned above, the tree census dataset file is enormous. It will eventually affect the visualization loading time and performance, so I needed to get rid of excessive data. However, due to the software environment and my capability, the dataset couldn’t be fixed in the way I expected. To achieve my initial goal for the visualization, I also couldn’t switch to some other datasets, which means that I have to deal with it. Besides the data size, there are some naming writing issues. Tree species names are not consistent in the spreadsheet. Sometimes the same tree species is written in all lower cases, and sometimes the first letter is capitalized. With all the problems in this dataset, I decided to leave it as it is and edit the tree species dataset to work with the tree census.

I used Excel(Figure 3) to edit the tree species dataset and match the tree census dataset using tree species names. For tree species names in tree census that didn’t match the species name in the tree species dataset, I duplicated the same row of data but changed the name in the cell to be the same as the one in tree census. And sometimes, the same species ended up having 3 rows of the same data except the writing of the name. For additional tree species that appeared in tree census but not in the tree species dataset, I added those species into the tree species dataset and filled out all cells by researching related information. One helpful website is mortonarb.org, where you can search trees and plants to see the attributes and data. Another tool I used to find fall colors for additional species is just the Google search engine. By doing so, I found the most common fall color for a specific tree species and documented it into the new row in the species dataset. By having the complete tree species on the species dataset, both datasets were ready to go for further design.

Developing A Storytelling Strategy

My goal is to provide an alternative of seeing fall foliage outside New York City and encourage people to visit some other neighborhoods. In order to help users engage in the subject, I planned to have map visualizations, textual content, and infographic style of visualization. To contain various types of content, I realized that I need a dashboard like a page of newspapers that covers all content with different space weights. I wanted to tell a story of exploration, from the overall New York City view of all trees to the borough, neighborhood, your street, and the specific tree you like. I also wanted to provide enough context for users to understand the overall statics of the environment they live in. These were basically my thought process that contributed to my dashboard design later.

Visualizing The Data

Tableau

I chose Tableau to visualize the data because it is one of the most comprehensive visualization tools and provides all kinds of graph styles and design capabilities. I simply created a dashboard by drag and drop and played around with the layout until I found the one that could work with the right space for different visualizations. I decided to include two maps in the dashboard. One is the main visualization for tree census on street level. Another one is the overall tree counts at the borough level.

Filters

I provided as many filters as possible to give users the free range of control to be self-exploratory. In the main map(the largest map), there are the filters of boroughs, tree species, and fall colors. The borough is in Manhattan by default due to the file size issue. If the default view is for all boroughs, the map will be loading very slowly, and users will have difficulty navigating through the details they want. Therefore, I needed to set a borough by default to optimize the loading time. In this map, users can also use the search tool provided by the map system by default to type in their location by zip code to find trees around an area and then manually zoom in to see details closely.

Color

As mentioned before, the overall dashboard was designed to help users engage in the subject and explore the story by their interpretation. I tried to eliminate excessive colors for the dashboard by only applying a black or greyscale of color to help ensure that the fall colors are the most prominent colors to users.

Figure 4.2. Design version 2 & final design

Infographics

To help users engage, I also designed some infographics on both sides of the dashboard. I made many mistakes from the design version one, which you can see in Figure 4. By conducting usability testings, I learned from users’ feedback and made the changes, as you see in Figure 5.

Usability Testing

Recruitment

After creating the dashboard with all visualizations and textual content, I conducted two moderated usability testings for the original design. One is in-person testing, and another is remote usability testing which I used Zoom to complete the testing by screen sharing. The recruitment process was relatively fast and straightforward because the screen criteria are to find participants who live in New York City and have seen fall foliage before. I found two participants from my own friends who had no prior knowledge about this project.

Research Goal

understand how the visualization performs and how likely users will be motivated to visit the trees on the maps through the visualization.

Testing Tasks

– Figure out the fall colors of the trees around your zip area/block/neighborhood.
– Find out which borough has the most Chinese Chestnut tree?

Interview Questions

– What is your first impression of the dashboard?
– What do you feel more interested in looking at?
– What have you learned from this report?
– What is your favorite fall color?
– What do you think that can be improved?

Findings

– Both participants found out that the street trees map(the largest one) is the most interesting visualization because of the location on the dashboard.
– However, one participant mentioned the missing central park data and indicated an interest in park data rather than street data.
– One participant found no indication for the left corner infographic, the total tree counts for each borough. The participant said,” I wasn’t sure what that is, since the images are the same.”
– One participant also mentioned that the right side of the infographic is not appealing to him because of the color repetition.

Iterations

– I redesigned the borough tree counts infographics using the boundary silhouettes instead of trees graphics which was misleading. And I also added a header for this section for guiding the users.
– Instead of choosing the top quantity of five tree species and their colors, I picked the combination of the five most common trees from each unique color. For example, yellow is the most common fall color, and London Planetree is the most common tree that its fall color is yellow. Additionally, I designed small tree graphics on the side to show the tree shape for another tree detail.

Final Design & Reflection

View Tableau Dashboard

The final design is a dashboard consisting of two interactive maps, two sections of infographics, one static visualization, and textual content. Overall I’m happy to see this result.

I have learned a lot from the class critiques, which helped me think of further improvements. First of all, I realized that I could remove some filters that take additional space on the dashboard and use the treemap chart to filter instead. I also learned that by having the infographic of each borough, the borough map is actually redundant, so I could eliminate the unnecessary content and include more exciting visualizations that appeal to users, such as more tree species distribution infographics. Overall, this dashboard and visualizations design opened up my data visualization design and storytelling journey.

Information Visualization

Student work at the School of Information, Pratt Institute