Introduction & Inspiration
I got inspired to visualize Airbnb data because thanksgiving is coming! For the break, my friends and I decided to visit LA and we used Airbnb to find a place to stay when it was less than a week before Thanksgiving. As my friends and I planned our trip quite late, we noticed that the listings that are left were often quite expensive and the ones we desired were mostly unavailable already. Therefore, I want to learn more about Airbnb listings’ distribution, prices, reviews, and listing availability for each state in this project.
Based on my own experience, I am intrigued to investigate the following research questions for this project:
- How are Airbnb listings distributed?
- How does the average listing price at each state differ from one another?
- Which state is the most popular in terms of average number of listing reviews?
- What is the listing availability like for each state?
Process & Methods
Step 1. Finding datasets
I found a dataset on Kaggle about Airbnb 2020 prices and their property information. This dataset is a csv file that compiles multiple datasets found on Inside Airbnb, including host id, hostname, listing id, listing name, latitude and longitude of listing, the neighborhood, price, room type, minimum number of nights, number of reviews, last review date, reviews per month, availability, host listings, and city. There are about 220k listings in total.
In order to map out the USA map, I found the shp file of the USA map on the United States Census Bureau website.
Moreover, as I want to compare the visualization of the average home price and Airbnb price for each state, I found the data about U.S. states and D.C. by median home price in 2020 on Wikipedia.
Step 2. Tidy up data
I used Excel to clean up rows that have empty cells (mainly the availability and review columns), which cut 218k rows to 177k listings.
Step 3. Calculate and add state data using R
One key pitfall for this dataset is that there is no state information for each listing but only their city and coordinates. Since I want to understand how listings differ in different states, it would be necessary to add state information as well. In order to do so, I looked at tutorials from Stackoverflow and used the sf (simple features) package to convert data points to an sf points object and match states with data points. Using the data that was tidied up, I then calculated the data needed including df_avgPrice_byState, df_avgReviews_byState, and df_avgAvailability_byState.
Step 4. Import files to QGIS and Visualize Data
After cleaning up the data, I imported the US map file and CSV files to QGIS. Then, I joined them together using state names. One thing to notice is that the CRS for the map file and CSV files are not consistent, so I manually set the CRS of the map file to EPSG: 4326.
Step 5. Go back to data cleaning to eliminate outliers
After I visualized the data, I noticed that the average Airbnb listing price in TX, WI, and OH are much higher than the price in New York. I got dubious with the finding so I went back to look at the data again.
By sorting the Airbnb price from high to low and low to high, I noticed that there are listings that have a price of $24999 per night while there are many that are $0 per night. I recalled that Airbnb has a premium branch called Airbnb Deluxe which offers mansions/luxury houses for rent, so I searched for some listings online to have an idea. For instance, the photo below is the most expensive house in this test Austin data set: Sapphire on Lake Austin.
While it is nice to see that the data set is quite up to date, listing prices like this are far beyond the majority of Airbnb users’ needs. To get a better idea of what ‘normal’ looks like, I decided to drop the top few percentiles of Airbnb listings in terms of price.
Moreover, for a number of Airbnbs with a listed price of zero – as nice as this would be, I suspect that it’s likely some manner of internal issues with the listing (perhaps an incomplete listing, or some other issue). Therefore, I decided to get rid of both the top 2.5% and bottom 2.5% of Airbnb listings and only keep the middle 95% to ensure better accuracy. The data set got reduced from 177k to 168k listings. I then moved on to import the newly cleaned data to QGIS and visualize it.
Results — Visualizations & Interpretations
In this section, I will analyze and discuss the maps I created with QGIS.
- New York, LA, Hawaii have the most Airbnb listings
Using QGIS heatmap to visualize the distribution of Airbnb listings, we can clearly see that the heatmap is the densest in New York, South California, and Hawaii areas. Using R to list out the distribution of Airbnb listings in cities, the bar chart reaffirms that the top three cities where the most Airbnb listings are located are New York City, LA, and Hawaii.
To indicate that the data is about Airbnb in the visualization, I used Airbnb’s primary color for the heatmap and bar graph.
2. Hawaii is the most expensive state for Airbnbs while Oregon is the cheapest
We can see that there are a couple of states where the average price is significantly lower / higher than the others, Hawaii stands out as the state with the highest average price 202, and Oregon has the lowest average price 115.
I was very surprised that the average listing price for New York is in the bottom fourth, about $128. As I went back to the dataset, I think an explanation for this finding is that most listings that have room types of sharing with others rather than owning the complete property are located in top-tier cities like New York, Seattle, and LA, which lowers the average listing price in the state. Another explanation is that Inside Airbnb has data from many large markets, but not necessarily every listing in the United States. This leaves many states with no data or data which is skewed towards cities. This means that Airbnb listings may just represent the Austin, Texas area but not for average housing prices for *all* of Texas, including inexpensive rural areas that bring the average down.
Again, to indicate that the data is about Airbnb in the visualization, I used Airbnb’s primary color (coral) and secondary color (green) for the map.
3. Inconsistent pattern in Average Airbnb Listing Price and Average Housing Price in each state
I have read many articles that criticize Airbnb as it hugely impacts the housing/rental price in local areas, so I wanted to compare and see the patterns between the average Airbnb listing price and the average housing price in each state. As a result, I was very surprised to see that there is no consistent pattern in them. For instance, in the map of Average State Housing Price, California, Washington, Colorado, and New York are a few states that have the highest average housing price, but in the Airbnb map, they have relatively low prices compared to other states. Similarly, while Texas, Wisconsin, and Tennessee have high Airbnb prices, they have relatively low housing prices in the state. Again, the explanation for this finding might be that there are more variations in Airbnb room types in first-tier cities’ states while most listings in other states are entire homes. And the data are only from large markets rather than all markets.
To allow readers to differentiate the two maps at the first sight, I used the Airbnb palette for the Airbnb price map and the spectral color palette for the state housing price map.
4. States that have fewer listings receive more reviews
The number of reviews at each state also shows us that some states have on average significantly more/fewer reviews than others. Interestingly, we can see that on average, North Carolina and Oregon have the largest amount of reviews in comparison to Florida and New York, which tend to have fewer reviews. By overlapping the Airbnb distribution heatmap over the map of the average number of reviews, we can see that states that have fewer listings receive more reviews in general.
5. Colorado and Texas have the highest occupancy rate of Airbnb in 2020
When we look at the availability feature, which tells us how many days a year each listing is available, we see that only a few states differ significantly like Colorado and Hawaii for example where listings in Colorado are available 116 days a year on average, unlike Hawaii listings which on average are available 228 days a year.
Looking at the table of the data set by sorting the average available date from low to high (busy to free), while I am able to understand that Florida and Hawaii are quite free most of the time in the year since people mainly visit there during vacations, I was quite surprised that Colorado and Texas have the highest occupancy rate. But then I realized that “oh it was 2020.”
According to the news, Colorado ranks high among most ‘moved to’ states during the pandemic. Similarly, based on another news, the 2020 U.S. Census showed Texas continued to surge in population, growing to 29,145,505 souls, an increase of 4 million in 10 years, and the largest number of any state.
Looking at data points about Airbnb availability in cities, in 2019, the average Airbnb occupancy rate is 31% in NYC and 28.3% in Austin, TX; in 2020, it’s 13.5% in NYC and 21.6% in Austin. Though both decreased, the occupancy rate was higher in Austin than in NYC.
Nationally, the Move.org analysis found that income loss was a key reason many were moving with 48% movers listing that as a factor. It’s also worth noting that 45% movers were seeking an upgraded housing option, which may include moving to a more favorable state. This could potentially explain the high occupancy rate in Colorado and Texas as people move there for cheaper but bigger housing during the pandemic.
QGIS was fun and very easy to learn. The use of layers allows creators to map out variations and the intersection between different data sets. However, finding the appropriate data sets to use was very difficult. It was very hard to find shp files or CSV files that contain coordinates. And it requires excessive data cleaning for the CSV files to work in QGIS, which causes a heavy load of extra work. In addition, simply using the map to convey meaning may not be sufficient in many cases, so a combination of QGIS with other data analysis tools like R works more effectively.
If I have more time for this project, I would spend more time cleaning data even further. As I discovered inconsistent patterns between Airbnb price and housing price in each state, I think it would be very helpful to evaluate Airbnb listings with a room type of entire housing only. This would allow us to see if Airbnb listing price correlates with the housing price in each state.
For future direction, I would be interested in exploring more about how Airbnb prices change over time in each state as well as the change in the real estate market over time. As 2020 is almost certainly an outlier year for a business like Airbnb, so it might be interesting to find a similar data set for another time period. Comparing the Airbnb prices with the real estate market would also enable us to have a deeper understanding of the correlation between the two.
“Airbnb Colors – Hex, RGB, CMYK, Pantone: Color Codes.” U.S. Brand Colors, https://usbrandcolors.com/airbnb-colors/.
“Average Airbnb Occupancy Rates by City 2020.” AllTheRooms Analytics, 11 June 2021, https://www.alltherooms.com/analytics/average-airbnb-occupancy-rates-by-city/.
Bureau, US Census. “Cartographic Boundary USA Files.” Census.gov, 8 Oct. 2021, https://www.census.gov/geographies/mapping-files/time-series/geo/cartographic-boundary.html.
“House Price Index Datasets.” House Price Index Datasets | Federal Housing Finance Agency, https://www.fhfa.gov/DataTools/Downloads/Pages/House-Price-Index-Datasets.aspx.
“Importing Spreadsheets or CSV Files to QGIS.” Importing Spreadsheets or CSV Files – QGIS Tutorials and Tips, https://www.qgistutorials.com/en/docs/importing_spreadsheets_csv.html.
“List of U.S. States by Median Home Price.” Wikipedia, Wikimedia Foundation, 7 Apr. 2021, https://en.wikipedia.org/wiki/List_of_U.S._states_by_median_home_price.
McKee, Spencer. “Colorado Ranks High among Most ‘Moved to’ and Most ‘Moved from’ States during Pandemic.” OutThere Colorado, 21 Dec. 2020, https://www.outtherecolorado.com/news/colorado-ranks-high-among-most-moved-to-and-most-moved-from-states-during-pandemic/article_b854eb0a-43cc-11eb-aa16-67ca140e0621.html.
MichaelMichael 6, et al. “Latitude Longitude Coordinates to State Code in R.” Stack Overflow, 1 Mar. 1960, https://stackoverflow.com/questions/8751497/latitude-longitude-coordinates-to-state-code-in-r.
Seth, Kritik. “U.S. Airbnb Open Data.” Kaggle, 25 Oct. 2020, https://www.kaggle.com/kritikseth/us-airbnb-open-data.
Walker, Lynn. “Where Did All These New Texans Come from?” Times Record News, Wichita Falls Times Record News, 26 Oct. 2021, https://www.timesrecordnews.com/story/news/2021/10/26/californians-leaving-moving-texas/8550399002/.