Mapping NYC Natural Gas Consumption in Carto

Introduction

Mapmaking is an essential skill for depicting geographic data. The findings of spatial data analysis, or the interpretation of location-based information, are most easily consumed by users in the form of a map. While line graphs and network diagrams are useful methods of representing data that are not, at their core, visual, maps have the potential to offer a pleasurable user experience and enhance greater understandability for data that should be interacted with through cartographic means. When dealing with spatial data, the use of a map allows for the possibility of congruence between a design model and the user model within the system image of the map.

In this study, I created a map in the online spatial data visualization tool called Carto that depicts natural gas consumption in New York City by zip code in 2010. Data was collected from NYC OpenData, cleaned up, imported into Carto, and then manipulated within the tool to create my resulting map. An image and link to this map, along with a discussion of its utility, can be found in the Results & Discussion section of this report.

Materials

Carto: Spatial data visualization tool available for use online at https://carto.com/.
NYC OpenData: Source of data used in this report. This site houses publicly available data about New York City provided by the city government. Data set used in this report comes from the “Natural Gas Consumption by ZIP Code – 2010” page at https://data.cityofnewyork.us/Environment/Natural-Gas-Consumption-by-ZIP-Code-2010/uedp-fegm.

Methods

Data Collection & Preparation

To begin, 2010 New York City natural gas consumption data was collected online from NYC OpenData. Next, I gathered three example maps depicting data about New York City to guide my visualization process. These example maps were all depicted data about NYC and utilized color, size, and labels in different ways. The successes and failures of how each map used these characteristics helped to guide my own mapmaking process.

Example Visualization 1

The map below from 2008 shows the percentages of women aged 40+ who received a Pap test in the past three preceding years or a mammogram in the past two preceding years in various regions of New York City:

(Source: http://www.openglobe.org/data/)

This map is ambitious in its scope since it illustrates occurrence percentages of two discrete items, Pap tests and mammograms, in one visualization. On the plus side, users should be able to clearly see the difference between the two items: Pap test data are represented by pink circles that contrast starkly with the blue and green shading that represents the mammogram data. However, besides that, the map is visually confusing, as I demonstrate below.

The pink circles in this map utilize size to represent percentages while the blue and green shading utilizes color intensity – a lack of consistency means that a user of this map will have to spend more time reading the legend which distracts from the map itself. The blue and green shading is also not easy to interpret: since there are only four color buckets being used, it makes more sense to use only one color that varies in intensity for each bucket. Right now, the blue used for 81.0% – 87.9% seems more like a completely different color from the turquoise used for 78.3% – 80.9% rather than something that signifies a greater percentage. More importantly, the two items, Pap tests and mammograms, would be better depicted on two different maps. While the two items bear similarities in their relation to cancers that typically arise in women, they are distinct enough to warrant their own maps. Trends between occurrences of the two screenings may even be better viewed when looking at two maps since a user must currently sort through the visual clutter on this single map before noticing a relationship between the screenings.

Example Visualization 2

The following map depicts the need for supermarkets in various census tracts across New York City in 2008:

(Source: slide 15 of https://www.slideshare.net/cgoranson/beyond-the-static-map-health-data-viz-and-crowdsourcing)

The above map succeeds more than the previous map in terms of simplicity. The color gradient is easy to interpret and only utilizes three buckets. The yellow, orange, red buckets are easy to see the differences between, so this helped to guide my color choice for my own map. However, the legend does not provide any quantitative ranges that can be referenced for further interpretation. The buckets for “minimal,” “moderate,” and “high” supermarket need (which is seemingly based off of the supermarket access index formula that is explained to the left of the map) doesn’t reveal much. With so much effort put into the quantitative measurement of supermarket need, hinted at in the index formula, it is unfortunate that it resulted in a map void of any numerical values. In the visualization of spatial data, it is important to have a legend that is not only simple, but clearly lays out the range limits of each bucket it includes.

Example Visualization 3

The next map displays walkability determinations across 2010 census tracts in New York City:

(Source: https://beh.columbia.edu/neighborhood-walkability/)

This map excels in its overall simplicity and in the understandability of its legend. A clear color gradient that utilizes varying intensities of orange makes it easy to spot the geographic clusters of high and low walkability index locations among the census tracts. Index values included in the legend, along with “Low” and “High” labels, make bucket range limits clear, which can aid in a follow-up comparative analysis of different census tracts. I would recommend, however, that the intensity of the green used for “Green Spaces” be lowered so as not to attract the eye so much. A less intense, faded green would allow it to fade into the background more, which would be better since it is not part of the walkability index analysis but is still relevant enough to be present on the map. While borough labels are helpful for non-New Yorkers, they seem unnecessary for a map who’s likely audience is comprised of New Yorkers. If the map is intended to address public infrastructure funding according to specific boroughs, however, it may be best to keep these borough labels on the map, while also adding bold borough boundary lines.

Data Cleanup

Following the analysis of the maps included in the previous subsection of this report, I started to work with the 2010 New York City natural gas consumption data from NYC OpenData. First, I carried out some quick data cleanup by deleting the zip codes from my data set and keeping only the coordinates for the geom, or geographic coordinates, column so that the file would be compatible with Carto’s data import process. All other data was kept as is since the values and labels were all logically organized and consistent – a rare find when collecting open source data. Some null values for geographic coordinates were present, but Carto ignored these rows, so I was able to leave them in my data set (note: not all data visualization programs do this, so for best practice, these rows should be excluded from the data set or accounted for in the resultant visualization when necessary).

Visualization Creation

Once the data was organized properly, I uploaded the file into Carto and initiated my map creation process. I began by choosing a faded gray background map free of any labels to reduce all unnecessary visual clutter. I then explored the Aggregation options of the Style section in Carto and chose to represent each zip code’s gas consumption with a polygon, which within Carto is called a “hexbin.” This made the most sense because shaded regions were not possible to use via geographic coordinates and because dots seemed too small, overlapped when enlarged, and would appear awkward if arranged to not overlap with each other. Carto’s hexbins could easily and consistently be arranged when not overlapping, so they were my top choice of the available aggregation options. I used a sum operation for the data column that included gigajoules (GJ) consumed and then selected a color gradient that went from a faded yellow to a dark red. I then selected a five-bucket option, with equal intervals, because it was the greatest number of buckets that I could choose without negatively affecting visibility (the next greatest option, seven buckets, made the color differences between buckets harder to discern). Next, I added a legend that included range limits for the color bucket. To complete the map, I added widgets for utility data source and the building type’s service class so that users could toggle between different types when needed.

Results & Discussion

Below is a link to my published Carto map for New York City natural gas consumption in 2010 followed by a screenshot:

https://mabb97.carto.com/builder/0a9da76d-89ae-473e-913b-2ec31e134c8d/embed

The map that resulted from this effort allows for users to easily pick out areas of high and low natural gas consumption in New York City. Dark red hexbins are easy to find and areas of faded yellow contrast starkly enough from the red. Intermediate gradients are unique enough to be picked out, which helped me to avoid including more different colors (which can be visually confusing). The legend I included allows for a simple depiction of the GJ consumption range for each color bucket.

The results of this project are a bit surprising since I was expecting to see that areas of higher natural gas consumption are in regions of greater population density, which I would have liked to include in the data analysis beforehand (I will discuss that later). It would be interesting to compare these results with locations of different industrial sites in New York City to see if that bears any correlation. To extract greater use of this map, additional information such as industry types, building insulation quality, and population density are needed.

Moving Forward

This map has some problems that should be addressed in future versions. As mentioned earlier, population density is not accounted for (even though it did not seem to be the primary variable in my resulting map, which is usually the case when it is left out), and so that should be represented in another data layer of my map to rule out misinterpretation of this visual on that account. I had considerable trouble getting 2010 zip code population data of New York City converted into an importable and usable format for this project, but further research and effort would likely result in a different outcome.

While my widgets allow for potentially interesting toggling features, my legend does not adjust when the view changes according to widget selections. Carto automatically changes the location hexbin colors on a map so that they reflect equal value intervals (if “Equal Interval” is chosen for buckets in the Style section) of what selections are picked within each widget. Either this color change should not occur with each selection (which seems like the best solution) or else there should be an option to adjust the legend according to each new selection. Also, the ability to include popups is disabled for my map because I used an aggregated style. If popups were allowed, it would have been useful to add zip code numbers as a hovering effect, along with GJ consumption values.

The inclusion of an expert voice in this process would also be a welcome addition: it is difficult to understand what exactly is considered excessive natural gas consumption when looking at this map. References to either the average or ideal GJ consumption of natural gas in New York State, the United States, or the world would help to put this map in a more realistic and useful context. With these changes, this map could be a helpful tool for sustainability researchers, energy management professionals, or individuals employed in public works.

Information Visualization

Student work at the School of Information, Pratt Institute

Mapping NYC Natural Gas Consumption in Carto

Related posts: