How different parameters affect the real estate prices in Boston?


Visualization

Introduction

First-time buyers venturing into the home-buying process usually have many questions and concerns. Purchasing a property is a big financial commitment, and it often involves many different steps that can be confusing to consumers. Challenges can come up that could make a purchase complicated and difficult. By learning as much as possible about the factors affecting price of a property and the home-buying process, consumers can prepare themselves for any obstacles that may present themselves.

Buying a property is a great investment – both financial and emotional. To understand a particular town and the factors affecting the real estate prices in that town is very crucial to the person investing. It gives a great sense of control and balance to take the decision while investing. This decision involves a great amount of financial commitment and being at peace with your decision is very crucial.

Whether you’re buying or selling, the market value of a home is a top concern. As a buyer, you want to find a house that’s not only a great place to raise your family, but also a good investment. And, as a seller, you want to sell your current house for top dollar so you can maximize your profit on the transaction. But what determines the market value of a home? 

An investment

In markets where land and building prices are rising, real estate is often purchased as an investment, whether or not the owner intends to use the property. Often investment properties are rented out, but “flipping” involves quickly selling a property, sometimes taking advantage of arbitrage or quickly rising value, and sometimes after repairs are made that substantially raise the value of the property.

Luxury real estate is sometimes used as a way to store value, especially by wealthy foreigners, without any particular attempt to rent it out. Some luxury units in London and New York City have been used as a way for corrupt foreign government officials and businesspeople from countries without strong rule of law to launder money or to protect it from seizure.

Methodology

The data is collected from Boston Standard Metropolitan Statistical Area (SMSA). This dataset contains information collected by the U.S Census Service concerning housing in the area of Boston Mass. Each record in the created dataset describes a Boston Suburb or town. Based on regression analysis, median value of owner occupied homes in $10,000 in the USA will be considered an dependent variable and other indicators affecting the price of real estate (per capita crime rate by town,proportion of residential land zoned for lots over 25,000 sq.ft,proportion of non-retail business acres per town,nitric oxides concentration (parts per 10 million),weighted distances to five Boston employment centers,average number of rooms per dwelling,proportion of owner-occupied units built prior to 1940,full-value property-tax rate per $10,000,pupil-teacher ratio by town,% lower status of the population, Median value of owner-occupied homes in $10,000) will be considered a dependent variable. This dataset concerns housing values in suburbs of Boston There are 506 cases each of which have 14 variables. The factors considered in the dataset are the following-

  1. CRIM: per capita crime rate by town.
    • It is defined as the number of killings for each person in town. When it is calculated as a per capita measure, it essentially means to divide the number of killings in that area to the total population of that town. It is an important measure for decision making when deciding to buy a house
  2. ZN: proportion of residential land zoned for lots over 25,000 sq.ft.
    • The majority of the zn values are zero in the dataset which means, many suburbs have no residential land zoned for lots over 25,000 sq ft.
  3. INDUS: proportion of non-retail business acres per town
    • Non-retail business means a business that derives less than ten percent (10%) of its total revenue from sales to the general public. It is a measure that may or may not have an effect on the prices of real estate.
  4. NOX: nitric oxides concentration (parts per 10 million)- Nitrogen dioxide in the air also reacts with water vapor to form nitric acid, one of the types of acid in acid rain. Nitrogen dioxide concentration in unpolluted air is around 10 parts per billion (ppb). Nitric oxide in the atmosphere comes from nitrogen fertilizers, emissions from fossil fuel use, and chemical manufacturing and wastewater treatment.
  5. DIS: weighted distances to five Boston employment centers
    • It helps to locate whether a property is close to an employment centre or not. If the property is located close to an employment centre then the value is predicted to be higher. It would also help the people save cost on commuting.
  6. RM: average number of rooms per dwelling
    • It is the average of the number of rooms in a house. According to the dataset it lies between an average of 6 to 7.
  7. AGE: proportion of owner-occupied units built prior to 1940
  8. TAX: full-value property-tax rate per $10,000
    • Property tax is defined as the percentage at which your property is taxed. Generally, the property tax rate is expressed as a percentage per $1,000 of assessed value.
  9. PTRATIO: pupil-teacher ratio by town
    • it defines the ratio of teacher per student. A good ratio helps people to estimate the quality of education in that particular area.
  10. LSTAT: % lower status of the population
    • define lower status as a function of the proportion of adults without some high school education and the proportion of male workers classified as laborers. It is a good measure to identify the good localities where the standard of living is good.
  11. MEDV: Median value of owner-occupied homes in $10,000
    • it is the estimate of the properties that are currently owned and now for sale.

The prices of the house indicated by the variable MEDV is the target variable(dependent) and the remaining are the feature variables(independent) based on which we can predict the value of a house. The excel sheet was used to create different interactions between price and different factors affecting it.

The sheet is loaded in tableau and different visualisations are created with the help of sheets and these sheets are further compiled in a dashboard.

For the first visualisation the median value of the property is placed on the Y axis and other variables are placed on the X axis. The scatter plot gives a relationship between the price of the property and other variables. The bottom left visualisation shows the relationship between crime rates and median value of a property. We see that as the crime rates increases the value of the property also drops. Middle top visualisation shows the average value of all the variables according to the price range set in the meter. Next visualisation is a compilation of a linear regression model of the different variables as seen against median value of the property. Top right visualisation shows the crime indicator according to price range. It shows the range from high, medium and low. The next visualisation is a correlation matrix that shows the correlation of all the different variables to each other.

Inspiration

This is a project I started in the last semester to design an architecture project. For that I needed to understand the value of the site and corresponding to that the project. The different factors that affect the value of the property give an idea about the neighbourhood that it is situated in and also the kind of people that standard of living that particular locality. While designing the project I came across myriad of factors that affect the price of the property. But some of the important factors are taken into consideration for this project.

Questions to focus on

  1. How does crime affect the price of real estate?
  2. How is proportion of owner occupied units established prior to 1940 related to the price of real estate?
  3. How does the concentration of NO2 in the air affect the price of real estate?
  4. Is the number of rooms related to the price of real estate?
  5. Is there any relationship between the proportion of lower financial class and price drop in real estate?
  6. How do rates fluctuate based on the crime rate in the area?
  7. Does your selection range of price impact parameters such as Average crime rate and Pupil teacher Ratio?
  8. What are the key performance indicators?

Use Case

A person buying a home.

A real estate broker who is looking for a house for a client.

A person buying property for investment.

A real estate analyst who wants to estimate the price of a property.

The Product

The project so created is a dashboard which essentially gives control to the user for him/her to change the parameters according to his/her needs and retrieve information accordingly. It is a dashboard that is a combination of different KPI’s that help a user regarding purchasing a plot in a particular real estate.

https://public.tableau.com/app/profile/divyansha.arora/viz/realestatebook/Dashboard1?publish=yes

The first sheet in the dashboard shows the relationship of median value of a property to different variables in the data set. the user can select a variable from top and the graph changes accordingly. The median value is a dependent variable and all other variables are independent variables. The user has the power to change the visualisations at his/her disposal. The pricing is kept at average. Where there is zero crime the prices are really high. Pupil teacher ratio is somewhat equally distributed along the graph. This gives the user a dynamic measure to determine the effect of various parameters on the price of a separate plot.

The second graph shows the relationship of median value of a property to the crime rates. We see that the highest that someone is ready to pay for a zero crime rate area is $5,000,000. it keeps getting lower and lower as he crime rate increases on the x axis. The rationale to do a separate chart for crime index was, that it is a very important parameter for people to choose a locality for where they want to live.

The top representation in the middle helps the user to select the price range that they are in their budget and the representation will change values to give out the average value of the different metrics. For instance if we change the metric to 15 from lower end and to 25 from the higher end, it gives us the average values of all other metrics. The average age is 66.1, average crime rate is 1.8, Average pupil teacher ratio is 18.7 etc.

Top right corner on the dashboard shows the crime rate indicator from low to high. We can set the range from the median value bar and the pie chart will change accordingly. If we set the median value from a range of 15 to 25 and hover over the pie chart and see the crime indicator we see different values in the graph which keep changing when you change the values in the bar.

Correlation and Regression

The image in the middle shows the linear the regression model of the relationship of median value of homes to all other variables. We see a pattern of linear regression in most cases.

Right adjacent to that is the correlation matrix showing the correlation of all the variables with each other. The correlation coefficient ranges from -1 to 1. If the value is close to 1, it means that there is a strong positive correlation between the two variables. When it is close to -1, the variables have a strong negative correlation.

  1. Dependent variable: medv, the median value of owner-occupied homes (in thousands of dollars).
  2. Structural variables indicating the house characteristics: rm (average number of rooms “in owner units”) and age (proportion of owner-occupied units built prior to 1940).
  3. Neighborhood variables: crim (crime rate), zn (proportion of residential areas), indus (proportion of non-retail business area), chas (whether there is river limitation), tax (cost of public services in each community), ptratio (pupil-teacher ratio), black (variable 1000(B−0.63)2,1000(B−0.63)2, where BB is the proportion of black population – low and high values of BB increase housing prices) and lstat (percent of lower status of the population).
  4. Accessibility variables: dis (distances to five employment centers) and rad (accessibility to radial highways – larger index denotes better accessibility).
  5. Air pollution variable: nox, the annual concentration of nitrogen oxide (in parts per ten million).

The Linear Regression and correlation conducted to study the influence of various relevant factors on MEDV gives us following insights:

  1. CRIM(per capita crime by town), NOX(Nitric oxide concentration), DIS(weighted distance to five employment centers), TAX(full-value property-tax per 10000 dollars), PTRATIO(pupil-teacher ratio per town) and LSTAT(lower status of the population) have a negative effect on the MEDV.
  2. INDUS(proportion of non-retail businesses per town), AGE(proportion of owner-occupied units built prior to 1940) doesn’t influence MEDV.
  3. ZN(proportion of residential land zoned for lots over 25,000 sq.ft.) and CHAS(Charles River Dummy Variable), RM(Average rooms per dwelling), RAD(index of accessibility to radial highways) influence MEDV positively.
  4. The prices increase as the value of RM increases linearly. There are few outliers and the data seems to be capped at 50. By looking at the correlation matrix we can see that RM has a strong positive correlation with MEDV (0.7) 
  5. The prices tend to decrease with an increase in LSTAT. Though it doesn’t look to be following exactly a linear line. LSTAT has a high negative correlation with MEDV(-0.74).
  6. The price increases with the increase in crime rate. They have a positive relationship which means that the increase in one is related to the increase in the other variable.
  7. Nitric oxide values also affect the price of the property. Though we see a lot of outliers but overall they share a moderately strong relationship.

Conclusion and Reflection

A same product can be created for any city. A structure like this give the user an access to make a sound decision while investing in a property. The different variables taken into account are an important indicator to fixate the price of a property in any town or suburb.

The motivation behind this research is how socio-economic structures influence prices in various cities. The unique contribution of this paper is that we investigated the various socio-economic factors influence on the median value of the residential properties. We found that, factors as crime, distance to employment centres, taxes and pupil-teacher ratio in neighbourhood public schools, property location affect the property value and it enable for us to understand the psychology of general public for choosing their place of residence.