Whether driven out by rent increases or searching for larger apartments, New Yorkers vacating their apartments don’t seem to be able to afford to stay in their own neighborhoods anymore. Luxury high-rise development and unit deregulation are just two of the many factors contributing to this problem. As more people leave and a new class of tenants arrive, neighborhoods can suddenly become unfamiliar.
This displacement of entire communities is felt on the streets, but do datasets exist that can represent these changes? If so, how can raw data be transformed to illustrate where change is occurring at an artificially high rate and even possibly allow us to anticipate where change is about to happen? I decided to create a visualization to tackle these questions in our GIS CartoDB mapping lab.
Originally I tried to approach the question from the angle of commuter data: mapping changes in average commuter times against populations. I thought that if I could show that more people are spending more time commuting into Manhattan and within boroughs, it would express displacement. But I was unable to locate good data that went beyond commute times. The one set I did find that included movement between home and work did not seem exhaustive enough. And there seemed to be a missing third dimension to my equation. This first attempt relied on far too many assumptions to be considered an adequate portrait of displacement.
I then began to consider a comparison between rental rates and cost per square footage over time, but again, this seemed to lack a third dimension. I finally arrived at the notion that rapid increases in median income for an area, more than rapid changes in rent, are a ready signal for gentrification-in-progress. It signals the displacement of a large segment of a neighborhood that is already a reality. If not an actual “displacement”, it at very least signals the insertion of a large number of households from a higher economic class into a population. Add to this a current rent level that is unaffordable to the population that existed there, and you have an extremely good measure for displacement. I decided to focus on data for median household incomes and rents for two-bedroom apartments. Figures on individual income offer less of a means for comparison with rental prices because it is uncertain how many individual incomes comprise the occupancy of a single apartment. I chose two-bedroom apartments, because I wanted to work with an apartment size that would be large enough to accommodate family households.
The NYU Furman Center performs extensive research on housing affordability and land use in New York City. In their recent report, “State of New York City’s Housing and Neighborhoods in 2014”, one series of maps illustrates the potential in using color in a choropleth map to represent change over time.
In the report’s Figure 1.4A, single population data points have been poured into the map’s areas, represented as colored areas. The effect is a simple snapshot of current area populations. In Figure 1.4B, however, the color represents the change in population over time. It offers the ability to raise more complex questions, especially if the viewer is familiar with New York City’s history since the 1970s. Unlike the first map from the report, it suggests stories for the viewer to engage with.
Another visualization that I found inspiring was a CartoDB visualization of increases and decreases in rent-stabilized units throughout New York City between 2007-2014 by John Krauss, which was written up on Curbed.com by Hana R. Alberts. It seems to have a strong similarity in motive and critique to my own goals.
Krauss’ visualization allows the viewer to easily see at a glance where the concentrations of change in rent stabilized units are and the nature of those changes, using color to represent percentage change, with red indicating larger losses. He uses green to mark large gains, which he itemizes in the tip tool, for example, as gains through 421-a or gains through J-51 tax abatements. (1) His use of layers allows a person to peel back to the level of increase/decrease the visitor wants to see, and because of this, it functions nicely as a tool. One question I have about Krauss’ visualization pertains to the size of the bubbles. The size seems to be based on the total overall number of units in the building, whether stabilized or not, but it’s possible that it correlates to the number of rent-stabilized apartments in a building. This is unclear, but for me, the latter would certainly be a more interesting choice in the context of this map’s theme. Either way, adding information to the visualization about what the size represents would have been useful.
Arguably the most iconic map of New York City is the MTA subway map. Even though it is not a data visualization, per se, there is a design sensibility I like and wanted to implement: stripping the map down to the essentials to serve its purpose (in this case, subway lines), while still including select extraneous details to orient the viewer, if needed, like neighborhood labels and some street and avenue names. As described in the next section, my data is aggregated by zip code, not neighborhood, but I decided it would be helpful to include neighborhood labels to allow people to relate the data and the visualization according to their own knowledge about New York City’s neighborhoods. Interestingly, however, neither of the other two examples I provided above chose to include neighborhood labels. For me, not including this type of information for my discussion would shift the intended audience away from the average user, and I wanted to make my visualization as accessible as possible.
The visualization required at minimum three datasets: two temporally different sets of median household income data and at least one set of two-bedroom rental price data. Thanks to federal decennial U.S. Census data and American Community Survey data, the first two datasets were not difficult to find. Aggregated rental price data, however, was not as easy to find. Zillow.com, the online real estate database, provides downloadable data in CSV format. I was able to find one set specifically for two-bedroom rental prices aggregated by month and zip code since 2011. According to Zillow’s newsletter, “Zillow Real Estate and Rental Data: Why we’re different”, this data representes the Zillow Rent Index (ZRI), which is a calculation of the median monthly rent for an area based on list prices and its own algorithm for calculating area rent based on available data. This is an undisclosed algorithm that they call “Zestimate”. Obviously, not having access to calculation algorithms is a considerable issue that data scientists face when using open data from the private sector and raises questions about the validity of what is being represented. (2) But in my case, using Zillow data to represent displacement and provide predictive measures for displacement is arguably still valid, in spite of the unknown algorithm, since markets that influence real-world conditions hinge themselves on such figures, and there is a significant amount of rental activity in the New York City that Zillow is presumably tapping into to calculate its Zestimate.
To prepare the data for my visualization, I added columns to the Zillow CSV file in Excel to calculate year averages on the Zillow monthly data and also added a column for needed annual household income amounts to afford that rent. This amount was calculated based on the principle that housing is considered “affordable” when the cost per year is 30% of the annual household income, which is also typically the calculation used for minimum required income to move into an apartment in New York City (40x the monthly rent). It should be noted that in the face of New York City’s high rent prices, many find it difficult to qualify for new leases due to the income requirement.
Because I chose a dataset from Zillow that aggregates data by zip code, I also wanted to use zip codes as the aggregation method for household income data. USA.com is a website that provides profiles of local areas using various sets of government data. For New York City, it allows you to access census data for the entire city by borough and by census tract/block group or zip code. I collected all the zip codes listed for New York City’s five boroughs (Kings, Queens, Manhattan, Richmond [Staten Island], and the Bronx) and then created a Python script to scrape historical median household income data from the profiles for each of those zip codes. The two data sets I scraped were the 2000 U.S. Census and 2008-2012 American Community Survey average that is provided. These were stored in a JSON dictionary with a calculation on the percentage change between the two values. I then flattened the JSON file into a CSV file for use in CartoDB.
Creating the visualization
For my visualization in CartoDB, I merged the two CSV files on the zip code field in CartoDB and georeferenced that field. Since I already knew I wanted to build a choropleth map on the zip code areas colored by percent change in median household income, I set the field to be a polygon. Areas with the highest percent increase were set to red, and those with the lowest were set to yellow. I set the labels to show the current median rent price (2015, to date) for a two-bedroom apartment for that zip code. I also brought in a NYC neighborhood shape file from Zillow to explicitly label neighborhoods.
A few further adjustments were necessary to facilitate readability when zooming in and zooming out. This included setting different text sizes and halo radii according to zoom factor. Because the price labels are the most important labels on the map, I tried to use color, font, halo, and size to ensure that the price labels always stand out more than the neighborhoods.
My initial concern was that the visualization might be too obtuse to communicate effectively. It was helpful to receive feedback from others in my class at an early stage. The general consensus was that the data juxtaposition and calculations were valid in locating possible displacement trouble areas in the city, but that well-thought labels and titles would be crucial in helping the viewer understand what was being framed. I added a title and a description that hopefully provide meaningful context for the visualization. I also used the HTML editor function on CartoDB to reformat and, more importantly, reorder the info window data to allow the viewer to follow the trajectory laid out in the data: 2000 median household income, 2008-2012 median household income, percentage change, 2012 median two-bedroom rent and affordability calculation, and 2015 median two-bedroom rent and affordability calculation.
Although I was initially concerned about my choice of datasets, after adding the suggested labeling, I was very satisfied with the end result.The visualization does allow the visitor to consider the issue of displacement and the effects of gentrification not just in terms of rising rents, but in the very real terms of severe population shifts expressed through rapid rises in neighborhood median incomes. Future additions to this project might be: 1) exploring other data sources to add to the Zillow rent data, and 2) using layers to offer overlays. Although the Zillow data was quite good to express trends from a market point of view, it would be better to combine that data with other rental data, for example, from the real estate site Streeteasy and/or even Craig’s List and other sources that might have more extensive data for the years before 2012 (the earlier Zillow data is less complete and does not include as many zip codes as more current data). If so, it would be interesting to see if the visualization seems more balanced if the rent and household income data are from the same years. And if that is successful, it might be easier for viewers if affordability is expressed in terms of overlay toggles, like hatching patterns on areas where the rent is more than 30% of median household income and another overlay for zip codes where the rent exceeds 50% of the median household income, which is the definition of severely rent-stressed. But my suspicion is that this would not be as revealing, since showing the data out of phase is what actually creates the portrait of displacement.
(1) One could argue, however, that these should be considered a different class of rent stabilization, since their rent stabilized status is not as protected, and therefore should not be represented positively with green. But this might be a difference of opinion.
(2) One question in this instance, for example, might be the issue of preferential rent in NYC, where the owner offers the tenant a monthly rent that is lower than the legal amount for a rent-stabilized apartment through a preferential rent lease rider, often because the legal amount is much higher than the current market rent for the neighborhood. However, the higher legal rent on record with the state. If Zillow uses these recorded legal amounts in calculating its ZRI, not only is it an inaccurate reflection of rent paid in the neighborhood, it can also have the compound negative effect of driving rents up faster, since figures published by companies like Zillow are held to be true and acted upon by people in the real estate world.