Unique interactions of the NYC Central Park Squirrels


Charts & Graphs, Visualization

This very simple project uses the 2018 squirrel senses and explore some squirrels activities being recorded outside of the categories that the census provided. The squirrel census is a project focused on the Eastern gray squirrels located in the New York City central park area. The dataset used in this project is by hundreds of volunteers, and the same group did two previous tally once in 2016, and in 2012, and their latest projected was completed in 2020.

A dashboard for based on the 2018 squirrel census

The census has several columns for volunteers to record different activities that the squirrels are engaged in. This include movements like running, chasing, eating, foraging, and the sounds they make, such as kuks, moans, quaas, as well as their body languages, such as tail flags. However, besides these common categories, many volunteers also wrote down other interactions that they seen squirrels engaged in, and they are also very important to looked at. 

Many people used this dataset to produce interesting visualizations. One that I really like is written by the website Again We Wander. They used interactive visualizations and heat maps to show squirrel population and their fur color, and they even incorporated squirrel vocalizations from SoundCloud, which really gives people an idea of what these categories mean. However, even though this project is visually pleasing, it overlooked the many other activities that were being recorded.

Most of these interactions were written down as sentences and notes, so it is important to clean these data first. Open Refine helped to clean this dataset so that it could be imported into Tableau for the visualization steps. After the dataset is being imported, the next step is to group the different activities being recorded by volunteers. This step is really about extracting key words from these descriptions. To do this, one needs to lick on the right side arrow of the category and select group, then follow the prompts to group the rows into desired categories. It is clear that many of these recorded ones can be grouped into the same categories as the ones provided, but most of them are separate because they don’t quite fit with the existing ones. For example, for Chasing, the dataset’s description is “squirrel was seen chasing another squirrel,” however, many of them were being chased, chasing each other, chasing birds or other animals besides squirrels. Many new interactions emerged as well, such as the action “chilling,” or “grooming,” as well as “playing.” One interesting description are interactions involving dogs, such as running away from dogs or trying to hide from dogs. Even though generally these actions can be grouped into running or runs from, volunteers still decided to point them out separately, probably because they see dogs out an unnatural factor that are influencing squirrels’ actions.

After grouping these extra description, a bar chart was being created to show the total number of these activities across the data collection period, which is from October 6 to October 20. This was done by using the grouped column on the x-axis and the count for squirrels int he y-axis, then color it by different interaction categories and label them. Since there are many categories, bar chart is a very direct way of show the number of squirrels engaged in these activities. A map of the central park area was also being created to show where these squirrels are, so that the readers can see them more directly. For the map, longitude is used for the column bar and latitude for the rows.

This is an attempt to categorize qualitative data in a quantitive way without any coding. The grouping process was mostly just by extracting key words from the notes volunteers wrote, which means that many activities are too vague to be grouped. For examples, some descriptions are just “nuts” or “acorn,” and even “quiet” and “quietly,” which makes it impossible to identify which interaction they are engaged in. During the process, many arbitrary decisions were being made, since a sentence could contain two or more verbs, and it is up to the person who is going through them to categorize them into one column or another. This is definitely a big problem, since different people can interpret the data in different ways, thus lead to different results. 

Another issue is that there are too many categories, which results in many colors. The graphs could be distracting or hard to read for people, especially the dotted map. The map’s coordinate also has problem, and was not fixed so that the coordinates would show up on the actual New York City map. 

In the future, it is important to analyze these data more so that a story could emerge from these data. Right now these visualizations are really just to show the numbers and locations, and people can’t learn too many about the life of these squirrels. It is important to think about how to turn these interactions and group them in a way so that people would be more interested in.