Airbnb and NYC Housing Availability


Visualization

Screen Shot 2016-07-06 at 10.39.02 PM

Introduction


My project will use AirBnb listings data to analyze AirBnb density in NYC, and more specifically, to examine AirBnb’s impact on the city’s housing supply.

I’ve been following the unfolding regulatory controversies between Airbnb and NYC for a while now, so I was excited to stumble on an AirBnb data dump with up-to-date information on NY listings. Lucky for me, my discovery came at an interesting time – in June 2016 the NY State Legislature passed a bill that would prevent AirBnb from advertising listings that qualify as “illegal short term rentals,” with fines up to $7,500 for noncompliance (Benner). Short term rentals, which have technically been illegal since 2010 however poorly enforced, have been identified by a recent Sharebetter study as a type of “impact listing,” or a listing that has tangible, negative effects on housing supply and rent prices. According to the study, short term rentals take housing off the market for extended periods of time and impair the city’s ability to preserve its affordable housing stock (Benner). To be considered a short term rental, listings must reflect the following criteria:

  • Guests may rent the “Entire Apt/Home”
  • Listing is available for rental periods less than 30 days and has at least one non-booked day per month
  • Listing is located in a building with three or more units
  • Listing is not a standalone house
  • The owner is not present during guest rental periods (BJH Advisors LLC)

In addition to short term rentals, the study determined that listings for “entire apartment” rentals and “commercial rentals” – rentals that are available for at least 3 months out of the year by hosts with multiple listings – also damage the thread work of NYC’s urban landscape (BJH Advisors LLC).

For my project, I decided to take a specific look at the density of listings that qualify as illegal “short term rentals” under NY State Law. Sharebetter’s study utilized 2015 AirBnb data, so I was interested to see how statistics have changed using my datasets of listings as recent as June 2016. Also, while Sharebetter’s study calculated how AirBnbs impact the city’s vacancy rate, I chose the route of determing the percentage of each census tract’s housing supply occupied by short term AirBnb rentals.

Data


The primary dataset used in this project was downloaded from Inside AirBnb, and includes AirBnb listings as recent as June 2016. I chose to download the “detailed” listings data, which offers an immense amount of information concerning individual AirBnb listings in NYC. The data was relatively clean when I found it, so I combed through and made simple adjustments such as trimming, transforming value types, and using text facet clustering to make sure everything was formatted correctly. The most complicated cleaning for this particular dataset involved identifying which listings qualify as “short term rentals.” While some of the criteria would be impossible to infer from my dataset (such as whether the Airbnb homeowner is physically present during the guest’s visit), I was able to filter the data to reflect the following criteria:

  • Listings that advertise an “Entire Home/Apt” Rental
  • Listings with a minimum stay < 30 days, meaning guests can rent for short periods of time
  • Properties that aren’t standalone houses (apartment, lofts, and townhouses are not considered standalone houses)

After filtering, I used Google Refine to add a column for “Rental Type,” which indicates whether a listing qualifies as a short term rental based on NY state criteria. I must note that I was unable to determine whether or not the listed properties have 3+ units – which must be the case for units to qualify as short term rentals. I also decided against filtering by “Maximum Nights” a guest can stay, because even though a guest is technically allowed to stay longer than 30 days, if the apartment is listed for a minimum stay < 30 days, it is likely they host guests for short rentals as well. Again, however, I would considered this a source of error in my dataset.

In addition to AirBnb data, I pulled relevant census data on housing from the 2014 ACS 5-year estimate on American FactFinder. With this data, I figured I would calculate the amount of housing occupied by short term rentals in each NYC census tract, then depict these numbers on a CartoDB map. Given that this was my first time using census data, I struggled a bit with the website and with data cleaning – I was amazed that the census tract data on housing I located did not share any column values with the NY census tract shapefile!

Assembling data for my first CartoDB visualization, which intends to display density of shorter term rentals, was fairly simple. I used the lat/long columns from my AirBnb dataset to do a spatial join with the census tract dataset. Getting my data into shape for my next visualization was another story. Because I planned to map the percentage of available housing occupied by AirBnbs in each census tract, I had to export my merge census tract/Airbnb file from CartoDB, then use the amount of intersections to calculate percentages against the “Total Amount of Available Housing Units” found in my census data set. This was not particularly easy given that, for some reason, the census tract shapefile I used did not share any unique values with the Census Tract housing dataset. I was required to split, conjoin, and clean columns extensively in order to conduct a VLOOKUP and pull in necessary information from my AirBnb/Census Tract merge. I then calculated percentages, and put final touches on my data before uploading into CartoDB.

Visualizations


For my visualizations, I chose two mediums to depict three separate ideas.

  1. The first visualization is a Tableau dashboard that uses basic bar charts to answer the basic question – “where in the city are short term rentals located?”
  2. My next visualization takes findings from the Tableau dashboard and creates a stronger rendering of a short term rental density map in New York. I also chose to instead analyze census tracts because they’re a smaller administrative boundary that can drill into the statistics of particularly high impact areas.
  3. Lastly, I uses CartoDB to take my analysis a step further and combine AirBnb data with census data on housing. Then, I mapped the percentage of available housing occupied by short term rentals in each census tract.

I integrated User Experience research into my process as I created each of these visualizations, hoping my findings would help influence design decisions that impact the overall usability of each viz. Because the the visualizations have interactive features, I decided to complete three rounds of usability testing to ensure that the functions of each viz work correctly. To affirm whether my visualizations are effective in communicating my data, I also conducted a round of post-usability test interviews to get an idea of user takeaways.

Four New York-based users were recruited through my own network – a friend of my mom’s who I had never met, a twitter acquaintance, a city employee, and a friend of my roommate. I specifically chose to conduct tests with users from NY because New Yorkers are the intended audience of my visualization, which is designed to help educate and spread awareness about the presence of short term rentals in NYC.

Tableau Dashboard

Created with Tableau

View Interactive Visualization

My first visualization, created with Tableau, is meant to provide a simple introduction to the topic of short term AirBnb rentals. For this reason, I chose to include a brief description of some of the implications of short term rentals, as determined by the Sharebetter study. I also clearly define what criteria listings must meet in order to be considered “short term”. The graphs were designed to answer the basic question posed in the text area of my visualization: What are the densest areas of “short term rentals” in the 5 boroughs?

“Rentals by borough” is a graph grounded in comparison that examines the amount of short term rentals vs. non-short term rentals in each borough. It provides an overall picture of the situation, and indicates that short term rentals are most prevalent in Manhattan and Brooklyn, and less so in the Bronx, Queens and Staten Island.

“Amount of Listings Per Neighborhood” aggregates rentals according to the NYC neighborhood listed in the data file. This graph can function as a standalone scrolling graph, or can be used in tandem with the “Distribution of Short Term Rentals” map. Usability testing helped me make the decision to link the “Search by Neighborhood” to both the bar chart and map, so that when users search by neighborhood, the graph displays the amount of listings in a certain neighborhood, while the map displays individual points within that neighborhood.

In alignment with Lindsey MacDonald’s “Using Color Effectively in Computer Graphics,” I made sure to “use a limited palette of colors” and “use color consistently throughout all screens in an application” (MacDonald 28). Since I was only representing one entity – short term rentals – I stuck to pink throughout the Tableau visualization, using dark gray in only one chart to represent non-short term rentals. As you will see in my CartoDB maps, I consistently used pink to tie all of the visualizations together despite existing in different digital spaces.

Usability Testing

  • Task 1: Highlight only “Short Term Rentals” on the dashboard

Test subjects were easily able to highlight short term rentals, however several seemed confused that the feature highlighted everything on the page except for a few gray bars indicating “non-short term rentals”. User feedback helped me make the decision to avoid creating a call to action for the highlight tool – something I was debated when I was designing my dashboard.

  • Task 2: Find how many short term rentals are listed in Queens

“I admittedly went to search “Queens” first in the search thing” – User 01

Several users immediately went to the search bar and typed in “Queens” until they figured out that the search function filtered by neighborhood. My takeaways helped me develop a stronger call to action for the filter feature – instead of just calling labeling my filter “Search,” I renamed it “Filter by Neighborhood”.

  • Task 3: Filter to view only the West Village on the map

Users seemed to find the filtration function easily, although that may be because it was discovered accidentally during task two. All users noted that they enjoyed being able to zoom in on their neighborhood to find short term rentals on their block, and user testing made me realize how one’s ability to zoom in and locate AirBnb locations in their neighborhood made the visualization more personal and relatable for the subject.

One user mentioned that they found it difficult to compare the amount of short term rentals in different neighborhoods because of the map’s density, so I decided to make two alterations to the visualization. First, I made the points on the map smaller in a quick fix. Second, I connected the map filter with the “Amount of Listings Per Neighborhood” graph. This way, when users filter by neighborhood, the corresponding selection automatically appears on the graph below, making it easier to determine how many AirBnbs are in each neighborhood.

I also drew some conclusions specific to the map – all of which helped me identify how to proceed. I realized that while the map is entertaining to explore, plotting individual points  is not the optimal way to convey density. A choropleth map would be ideal for this purpose, but because the smallest shapefile built into Tableau is zipcode (which did not make a compelling visualization), I made the decision to create a strong density visualization with CartoDB.

CartoDB #1

Screen Shot 2016-07-06 at 10.54.37 PM

Created with CartoDB

View Interactive Visualization

“Short Term Airbnb Rentals in NYC” takes the previous Tableau visualization a step further with a choropleth map that displays density of Airbnbs by census tract. I specifically chose to use census tracts because they drill into more specific areas of the city than neighborhoods because smaller administrative boundary allows viewers to hone in on small slices of the city with high density. The one downside of using census tracts is that they’re not immediately recognizable by number – they must be viewed geographically for the necessary context to determine where each census tract is located. For that reason, I included “Neighborhood” on a tooltip so users can apply their contextual knowledge of specific neighborhoods to each census tract displayed.

One advantage to using census tracts was that it allowed me to compare my AirBnb data with other NYC census data, which would prove useful for my next visualization and opens my research up for more analyses in the future.

Usability Testing

  • Task 1: Search for your address in the search bar and tell me the amount of Airbnb rentals in your census tract.

The address search turned out to be no problem, however several users were stuck on the fact that there is no way to remove the pin CartoDB drops when you search a location. While annoying, there is no way I know of to fix this issue so I decided not to act on this piece of feedback.

  • Task 2: Show me what you think are the three densest areas of AirBnb short term rentals in NYC.

Users were able to identify Midtown Manhattan, the Lower East Side, Chelsea and Williamsburg as highly dense areas of short term rentals, albeit several mentioned that lack of visible borders surrounding census tracts caused all of the shapes in dense areas to blend together. With an simple fix, I darkened the census tract outlines.

As an aside, after user testing some users voiced that they wanted more out of this visualization than just an snapshot of rental density – they were hoping for a stronger takeaway about the affect AirBnb has on neighborhoods. Their feedback helped me smooth out my plans to make a final visualization that uses Census Data to visualize the affect of short term rentals on housing availability.

CartoDB #2

View Interactive Visualization

My final visualization used Airbnb data in combination with census housing data to display the amount of a census tract’s available housing units that are currently occupied by short term rentals. I also layered Neighborhood Tabulation Area labels over the map to breathe some context into the clusters of census tracts. The higher the percentage of available units occupied by short term rentals, the darker the shader of pScreen Shot 2016-07-07 at 10.35.26 AMink and vice versa. I designed this rendering to go a step beyond the last map by showing the census tract housing supplies most deeply affected by Airbnbs. The intention of this map is not only to identify problems areas, but to show how Airbnb contributes to the housing crisis in certain areas.

I designed the tooltip to include all information necessary to determine whether the statistic is valid. This was necessary because some census tracts, such as areas surrounding the Brooklyn Navy Yard and Maspeth Queens, show a large percentage of housing occupied by short term rentals because there is not a ton of housing in those areas to begin with. By including the “Availble Housing in Census Tract,” one can determine areas where my methods may be misleading. Given more time, I would annotate these areas to reflect error.

Another known source of error is that “Available Housing Units” in each tract have likely changed since 2014, especially in areas where there is a lot of development. Given more up-to-date figures on available housing units in each census tract, my visualization would display a more accurate snapshot of the situation.

Usability Testing

  • Task 1: Find your neighborhood on the map and describe the affect of short term rentals on available housing in your area.

Users had little issue identifying their neighborhood, especially with the neighborhood tabulation area labels layered on the map.

  • Task 2: Show me what you think are the four areas with housing supply most affected by Airbnbs

For the most part, test subjects correctly identified Williamsburg, Midtown, Lower East Side, and parts of Soho as areas where housing supply is significantly lessened by the presence of Airbnb short term rentals. In several cases, users cited Maspeth, Queens and the Brooklyn Navy Yard area as being affected. I noted that they had been confused by the skewed percentages in those areas due fewer housing units. I made note of this, and in future versions of the project plan to either annotate or find a way to adjust for this type of misleading visualization.

User Interviews

After my third series of usability tests, I wanted to gauge user takeaways specific to my “Airbnb Rentals and NYC Housing Stock” CartoDB map. I chose the final map because I think it’s most likely to lend itself to the formation of opinions surrounding short term rentals in NYC, and I wanted to see if the visualization was communicating my message.

1 – What do you think this visualization is trying to display?

Users correctly identified that the map displays the relative amount of housing that Airbnb takes up in a given census tract.

2 – After viewing this visualization, what do you think about the presence of short term AirBnbs in NYC?

One user responded that they think the presence of Airbnbs creates more of a burden on housing stock in areas of Brooklyn and Manhattan where there is already an extreme demand. Others noted that they feel the areas where Airbnb short term rentals drain the most housing supply are expensive, touristy areas to begin with, but that it’s alarming to see how much housing Airbnbs take up in areas of Brooklyn, and on the Lower East Side.

3 – Describe anything you find confusing or unclear about this visualization.

Users overall found the boundaries and meanings of census tracts most confusing about this visualization. They were also confused by some of the neighborhood names because they reflect “Neighborhood Tabulation Area” names rather than more informal neighborhood names.

4 – How can this visualization better achieve its purpose to educate the public about AirBnb’s impact on urban areas?

I received a variety of suggestions for future projects. One user suggested showing median home prices or rental prices in the neighborhoods or census tracts, as this would potentially draw correlations between real estate value and stress from Airbnb density. Other users said they would like to see a visualization of Airbnb listings beyond short term rentals – perhaps all listings for entire apartments. One response that I might turn into an extension of this project, was a suggestion that I should display how the amount of occupied housing has changed from 2010 – 2014 as Airbnb was becoming more and more popular. This suggestion seemed in a similar vein as the ShareBetter study, and something worth pursuing with up-to-date data.

Insights


Over the the course of this project, I gleaned several insights from both my visualizations and user testing that may help direct future revisions or extensions of the project.

User Testing

  1. Choice of administrative boundary impacts viewer context. Census tracts numbers are less recognizable than neighborhood names, and don’t provide the context necessary for users to fully understand the visualization. In addition, users were confused by the neighborhood labels because they reflected “Neighborhood Tabulations Areas” rather than more well-known neighborhood names. For this reason, I thought it was necessary to layer neighborhood tabulation areas to the more zoomed in “Airbnb Short Term Rentals and NYC Housing Stock” CartoDB map. For the more zoomed out “Short Term Airbnb Rentals in NYC” map, I was hesitant to apply neighborhood labels because I was concerned about visual clutter, so I instead added neighborhood names to the tooltip and switched from a Positron (lite) map, to a Positron map with labels. Potential Revision: One potential revision would be to find a gazetteer that would allow me to translate Neighborhood Tabulation Areas to more recognizable, colloquial neighborhood names for my label layer.
  2. Maps make things personal. Users enjoyed the short term rentals map in Tableau because they were able to look at specific AirBnb locations in their neighborhood, or even on their street. Because my test subjects are all from New York, viewing the density and impact of Airbnb short term rentals on their neighborhood made the mapping visualizations more impactful and meaningful.
  3. Choropleth visualization did not represent the reach of the problem. While the “Airbnb Short Term Rentals and NYC Housing Stock” did influence some test subjects’ opinion on the presence of Airbnb in NYC, others were not particularly moved by the low percentages in most census tracts. More generally, the use of a choropleth map casts the illusion that the problem is constrained only to certain areas, and is not something that ends up impacting the housing crisis throughout the whole city.

Visualizations

  1. Short term rentals are most ubiquitous in areas I already associate with expensive rent prices and/or gentrification. Both the Tableau and CartoDB density visualizations show the densest areas of rentals in areas that get an amount of press for gentrification, such as Williamsburg, LES and Bedstuy. Both expensive and gentrified areas are generally most desirable for tourists – which likely means more profitable listings.
  2. Airbnb short term rentals compromise the most available housing units in expensive, gentrified areas, and areas near tourist attractions. Census Tract 2051700 in Williamsburg is 8.28% short term rentals, Tract 1004300 in Soho is 5.25% short term rentals, and Census Tract 1010900 in Midtown Manhattan is 12.58% short term rentals – areas known for high rent prices and, in the case of Soho and Midtown, tourist attractions. The map also indicates areas of high density in Bedstuy, Prospect Heights and Bushwick – all hotbeds of gentrification.
  3. Some less residential areas such as the Brooklyn Navy Yard (census tract 3054300) and Maspeth (4021900) display misleading figures because fewer available housing units means that fewer short term rentals drive up the percentages.
  4. From my final CartoDB visualization, one can determine that Airbnb does eat up a significant amount of available housing in select tracts and neighborhoods, which likely puts some stress on supply. That said, it’s difficult to determine whether Airbnb has any correlation or affect on gentrification in high-development areas.
  5. Staten Island, the Bronx, and Queens does have AirBnb problems….yet. It will be interesting to see how and if this changes in coming years, and could potentially be the focus of a future project.

Future Directions


For future projects, I would also like to run similar “% of Available Housing” analyses with additional types of “impact housing” indicated in the ShareBetter report, such as “Entire Apt Rentals” and “Commercial Listings”. Looking at these listings in aggregate may shed more light on a more significant issue that my current report is capable of identifying. Perhaps “Commercial Listings” would even require its own separate map to represent. On top of examining a broader scope of “impact listings”, I’d like to measure these listings against multiple sets of census data, such as average income within census tracts, or amount of people per household. An addition to the map could be to include plotted points of NYC landmarks, to establish whether there’s anything correlation because Airbnb density and presence of landmarks.

One final idea I have for a future project is to analyze data on AirBnb hosts. How many listings do hosts manage, and where are those listings located? Do hosts live in New York, or do they “landlord” Airbnbs remotely? What is their average income level? Perhaps with a future data dump I will have information necessary to continue my analysis and build a more robust report.

Sources

Benner, Katie. “A Brooklyn Neighborhood Where AirBnb is Being Put to the Test.” July 3rd, 2016. Web. 2 July 2016. http://www.nytimes.com/2016/07/04/technology/a-brooklyn-neighborhood-where-airbnb-is-being-put-to-the-test.html?_r=0

InsideAirBnb. “Detailed Listings data for New York City”. June 2016.Web. 3 July 2016. http://insideairbnb.com/get-the-data.html

MacDonald, Lindsay W. (1999). “Using Color Effectively in Computer Graphics” Computer Graphics and Applications, IEEE 19(4): 20–35

“Selected Housing Characteristics.” 2010-2014 American Community Survey 5-Year Estimates. 2014. Web. 3 July 2016. http://factfinder.census.gov/faces/tableservices/jsf/pages/productview.xhtml?pid=ACS_14_5YR_DP04&prodType=table.

“Short Changing New York City.” BJH Advisors LLC, June 2016. Web. 2 July 2016. http://www.sharebetter.org/story/housing-report-short-changing-new-york-city/