1990 California Housing price


Final Projects, Visualization

The interactive final deliverable can be accessed here.

Introduction

California’s housing market had always been a hot topic for people all over the world. People are concerned about housing price for all kinds of reasons: living, investment, education, etc. The existing California single-family home median price hit with the new record of $758,990 in March, 2021. Most of the realtors are optimistic with the overall market and the expected sales in the upcoming months. As the demand is predicted to rise as well as the price. It will be helpful for people who are planning to purchase houses in California to have something to compare house prices for them. The aim of the visualization is to provide a reference to users who are interested to find out more about housing prices. Referring back to the data in 1990 can provide buyers prediction on house appreciation, and find out places which developed fastest and added value the most, hence make economic decisions. Even so, the visualization limits in its ability to show viewers more detailed information on many other factors people are concerned about, such as HOA fee, housing types, land use rights and so on. In addition, the data needs constant updates for it to become more useful.

Inspiration

As the final project of the class, I decided to aim at a more visually appealing deliverable. Throughout the semester, I find Carto and Tableau interests me the most for the level of freedom it grants users in analyzing data. Therefore, I think of making a map-based visualization which shall demonstrate almost all the learnings throughout the semester of the specific software I am targeting. Through browsing the featured works in the gallery of Tableau Public, I find the City and Color and the Bangalore’s Residential Hotspots appealing to me in balancing the visual and the functionality of the dashboard.

From City and Color, I realized that representing the map with various color can still be visually pleasing, not chaotic. From Bangalore’s Residential Hotspots, I learned that highlighting different areas of the city on the side might be helpful in giving a quick sense of the overall rent price of the city. I also learned that representing the population with high density by red and low density by green is quite intuitive to viewers.

Data

The major dataset I used is California Housing Prices on Kaggle. The dataset contains information on block groups in California from the 1990 Census, and 10 measures, including longitude, latitude, housing median age, total rooms, total bedrooms, population, households, median income, median house value, and ocean proximity. I chose to use the data not only because of its richness in message, but also because it prompted me to create a map-centered visualization at the first sight, which matched with my goal exactly.

One of the other data I employed is the population by county accessed from the official website of the California government, and the other one is the median home price by county taken from the National Association of Realtors. I manually combined the two data into one csv file for connected use with the Kaggle data in Tableau.

Process

At first, I try out the dataset with Carto. However, the analysis is not enough for me to create the outcome I desire. Next, I imported the data into Tableau and formatted the data into points on map. The basemap turned out to be distracting. Therefore, I customized a clean map by switching the color to white excluding the state border, county border etc. through mapbox. Then, I sorted these points based on the medium house value. Instead of using a continuous coloring with only one to two color, which makes most of the points hard to see because the color is too light for lower numbers, I used the distinct coloring from the palette of hue color. The more towards red and purple, the higher the median house value, and vice versa for green and blue. The sheet is made into the main sheet centered in the middle of the dashboard for it contains the most information compared to other sheets.

After that I imported the data by county into a new map. However, there was nothing showing on the map. It seemed that Tableau couldn’t figure out the counties solely. To resolve this, I added the Country name and State name in the raw data, this way, tableau successfully detected the data, and users are now able to select individual counties. As followed, I explored this map with dual-axis. One axis is heap map represents the population density for each county. The other axis is a map divided into counties of California. Counties with higher median values lean to red while the lower ones lean to green. I washed out the background map, leaving the entire sheet blank so that users could see each the color of county more clearly. The coloring of the two axes is consistent in the final deliverable.

I have added the Tooltip for both maps. When users hover through the county map, the tooltip would display the names and the median house value of the certain county. For the map which has all the household data, I added another sheet in the Tooltip. The sheet contains a Gantt chart based on the comparison between the bedrooms per household, total room per household for users to gain a quick view on room types. When users click on the block of households represented by dots, the Tooltip would appear with further information on the housing median value and median age, the smaller the number, the younger the housing is.

In order to achieve displaying data by area like how I got inspired by the Bangalore’s Residential Hotspots, I divided the data by ocean proximity, which contains five categories: inland, <1 hour to ocean, near ocean, near bay, and island. Island, in this case, contains only five rows of data with extremely high values, which I determined to be outliers and taken out of consideration. I made 4 histograms out of the remaining data. The columns are discrete medium house value, and the rows are count of the medium house value. I used 50k as the range for each category.

Through the bar charts, users can explore with the population distribution for each price ranges within the certain area. I also highlighted the bar with the most population, and I marked the percentage of it compare to other price ranges in the particular region. I highlighted the first and the third with red, the second and fourth with blue. The color assignments are consistent with my maps. I added a text of explanation under the titles for users to read about as well.

I added another Gantt view with a single axis of counts of households by value. Users can supposedly see the mode of the house value. For example, the right extreme shows that there are 6082 houses valued $118,800, and few people live in houses with high prices to the very left of the chart. The chart is also responsive with actions in the main map. Originally, I had also made the average population and the average median income displayed and interactive with clicking the dots from the main map. I made the income displayed in unit K (thousand) first through calculation and then through formatting.

UX Research

I recruited two participants for unmoderated user testing. Wechat call is used as the tool for interviewing. Participant one is a female, 27 years old, who had lived and worked in Santa Clara, California, and had been planning to purchase a house at there. Participant two is a male, 23 years old, who had lived in New York, who loves topics about finance and investment, and also concerned with house pricing. During the session, users were asked to think out loud. The tasks are designed to evaluate the information structure, visual/appearance, and interactivity of the chart. 5 tasks were given to them to complete, followed with post-task questions:

  1. Try to find a section on chart that interests you, and describe the information you find.
    • What did you see at first sight of exploring the map/chart?
    • Why does the information interests you?
  2. Explore the right side of the chart, and describe the information you find.
    • What do you think of the information?
    • Does the color stand out to you?
  3. Suppose you are planning to purchase a house in California, try explore the map in the middle.
    • Did you find the information you want?
    • What else factors do you care about when deciding to purchase a house?
    • Does the coloring make sense to you?
  4. Explore the left side of the chart, and describe the information you find.
    • Does the information matter for you?
    • Is there any information you feel is missing?
  5. Check the map on the left bottom corner, and describe the information you find.
    • Does the coloring make sense to you?
    • Does the information provided meet with your expectation?

Findings

Users both paid attention to the right side of the chart first, which states out the regional information. In general, the information provided is less than expected, meanwhile the visual is easily understandable, and last but not least, the interactivity of the map is a little bit hard to notice.

Based on the problems discovered, I have made two major changes. The first one is I added the sentence “Try click it” in the light grey note, and the second one is I changed the “Average Median Income” displayed on the right into “Median House Value”. Due to time and scope restrictions, I did not make as much changes as the users suggested. For example, adding the commas in the data, or replacing the Gantt chart with other charts, for example, bullet graphs. There are some reported problems which I ignored, for example, the user reported the note sentence color too light, but still he successfully found and read it. In the future, more participants should be recruited to test about the usability of the chart in order to validate some existing problems and discover new ones. The final deliverable can be accessed from here.

In comparison with the housing price nowadays in California. From the chart, San Francisco and Los Angeles remain the two most locations where the housing prices are the highest. The outcome is consistent with the current housing prices. Moreover, the central coast region (21.3 percent), the San Francisco Bay (20.5 percent), the central Valley (18.6 percent) has the greatest year-over-year growth rate. (noradarealestate.com, 2021) Such data is unable to reach on this visualization, which should be improved in future steps.

Future Steps

In the post task interviews, I asked about what factors users care about when purchasing houses. They mentioned about the HOA fee, the type of housing, for example, apartment, townhouse, condo or others. They also care about the land use rights. For next steps, more charts can be employed, such as data since 1990 to 2020. The combined data can display a more distinct view on the rise and fall of the prices, and thus help the users find opportunities to invest their money. Besides maps, infographic can be a powerful tool in representing the changes as well. These are all worth trying in improving and finalizing the visualizations in the future.