{"id":36339,"date":"2023-03-21T11:26:09","date_gmt":"2023-03-21T15:26:09","guid":{"rendered":"https:\/\/studentwork.prattsi.org\/infovis\/?p=36339"},"modified":"2023-03-21T18:50:06","modified_gmt":"2023-03-21T22:50:06","slug":"a-visualisation-of-housing-prices-in-california","status":"publish","type":"post","link":"https:\/\/studentwork.prattsi.org\/infovis\/visualization\/a-visualisation-of-housing-prices-in-california\/","title":{"rendered":"A visualisation of Housing Prices in California"},"content":{"rendered":"\n<p><strong>Introduction<\/strong><\/p>\n\n\n\n<p>This visualisation is an exploration of the housing prices in the state of California. The dataset gives an insight into household income, housing price , age of residents and location of the properties. The entire dataset is 20,000+ entries and is a fairly tidy dataset. The idea with this visualisation was to figure proximity to ocean which are so-called premium properties and plot it against median-income and median-resident age. I also wanted to explore property prices as a function ocean proximity and affordability as suggested by median income.<\/p>\n\n\n\n<p><strong>Dataset<\/strong><\/p>\n\n\n\n<p><a href=\"https:\/\/www.kaggle.com\/datasets\/camnugent\/california-housing-prices\">https:\/\/www.kaggle.com\/datasets\/camnugent\/california-housing-prices<\/a><\/p>\n\n\n\n<p>The dataset was from an open-data platform made available by Kaggle. The datasheet was a fairly tidy one and didn\u2019t require much cleaning.<\/p>\n\n\n\n<p>The data covers parameters like.<\/p>\n\n\n\n<p>1. longitude<\/p>\n\n\n\n<p>2.latitude<\/p>\n\n\n\n<p>3.housing_median_age<\/p>\n\n\n\n<p>4.total_rooms<\/p>\n\n\n\n<p>5.total_bedrooms<\/p>\n\n\n\n<p>6.population<\/p>\n\n\n\n<p>7.households<\/p>\n\n\n\n<p>8.median_income<\/p>\n\n\n\n<p>9.median_house_value<\/p>\n\n\n\n<p>10.ocean_proximity<\/p>\n\n\n\n<p>I further used Open Refine to check for tidyness of dataset<\/p>\n\n\n\n<p><strong>Tools -used<\/strong><\/p>\n\n\n\n<p>Tools that were used to make this visualisation were Excel, Open Refine and R. The report was made and published using WordPress.<\/p>\n\n\n\n<p><strong>Process<\/strong><\/p>\n\n\n\n<p>The making of this visualisation required tools like OpenRefine, excel and R. Each tool served a separate purpose<\/p>\n\n\n\n<p><em>Research<\/em><\/p>\n\n\n\n<p>The first phase of my process involved researching for Open-data sources that could provide this data. This also involved using excel to manually go through and understand the data and then using Open Refine to do a check on any further cleaning that may be required.<\/p>\n\n\n\n<p><em>Open-Refine- Data Cleaning<\/em><\/p>\n\n\n\n<p>I used Open Refine to clean the data. The data was fairly clean, hence I just exported it as a csv<\/p>\n\n\n\n<p><em>R- Data Representation<\/em><\/p>\n\n\n\n<p>The visualisations were done in R. I imported the <em>tidyverse<\/em> library and then used the <em>read_csv<\/em> to import my cleaned csv. I also added a few columns like housing price\/1000 to denote the numbers in a $100K multiples format.I also used the mutate command and generated another column to view median income in million $ multiple format<\/p>\n\n\n\n<p><em>Visualisations &amp; Observations<\/em><\/p>\n\n\n\n<p>1. Geometric Point: I initially used geometric points to plot median household age vs median income and used ocean proximity of the property as an aesthetic value to understand what type of housing a certain segment of society was able to afford. This turned out to be very scattered, hence I used a logarithmic scale to tone down the data points.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2023\/03\/Rplot08.png?resize=531%2C522&#038;ssl=1\" alt=\"\" class=\"wp-image-36344\" width=\"531\" height=\"522\" srcset=\"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2023\/03\/Rplot08.png?w=531&amp;ssl=1 531w, https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2023\/03\/Rplot08.png?resize=300%2C295&amp;ssl=1 300w, https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2023\/03\/Rplot08.png?resize=183%2C180&amp;ssl=1 183w\" sizes=\"auto, (max-width: 531px) 100vw, 531px\" \/><figcaption class=\"wp-element-caption\">Median income in millions for California vs median household age with colour representing proximity of house to the ocean<\/figcaption><\/figure>\n\n\n\n<p><a href=\"https:\/\/rpubs.com\/snehganjoo\/1018160\">https:\/\/rpubs.com\/snehganjoo\/1018160<\/a><\/p>\n\n\n\n<p>2. Column Data &#8211; The above data was difficult to read hence I plotted the same using a column chart which made understanding that data easier. I also used a column chart to plot ocean proximity vs median income.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"531\" height=\"522\" src=\"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2023\/03\/Rplot07.png?resize=531%2C522&#038;ssl=1\" alt=\"\" class=\"wp-image-36343\" srcset=\"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2023\/03\/Rplot07.png?w=531&amp;ssl=1 531w, https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2023\/03\/Rplot07.png?resize=300%2C295&amp;ssl=1 300w, https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2023\/03\/Rplot07.png?resize=183%2C180&amp;ssl=1 183w\" sizes=\"auto, (max-width: 531px) 100vw, 531px\" \/><figcaption class=\"wp-element-caption\">Median income in millions for California vs median household age with colour representing proximity of house to the ocean<\/figcaption><\/figure>\n\n\n\n<p><a href=\"https:\/\/rpubs.com\/snehganjoo\/1018155\">https:\/\/rpubs.com\/snehganjoo\/1018155<\/a><\/p>\n\n\n\n<p><a href=\"https:\/\/rpubs.com\/snehganjoo\/1018158\">https:\/\/rpubs.com\/snehganjoo\/1018158<\/a><\/p>\n\n\n\n<p>3. Box Plot &#8211; I finally used a box-plot to plot distribution of median income vs housing cost and used the ocean proximity to reinforce premium nature of property.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"531\" height=\"522\" src=\"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2023\/03\/Rplot06.png?resize=531%2C522&#038;ssl=1\" alt=\"\" class=\"wp-image-36342\" srcset=\"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2023\/03\/Rplot06.png?w=531&amp;ssl=1 531w, https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2023\/03\/Rplot06.png?resize=300%2C295&amp;ssl=1 300w, https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2023\/03\/Rplot06.png?resize=183%2C180&amp;ssl=1 183w\" sizes=\"auto, (max-width: 531px) 100vw, 531px\" \/><figcaption class=\"wp-element-caption\">Median income vs median house value for houses in California with property proximity to the ocean<\/figcaption><\/figure>\n\n\n\n<p><a href=\"https:\/\/rpubs.com\/snehganjoo\/1018157\">https:\/\/rpubs.com\/snehganjoo\/1018157<\/a><\/p>\n\n\n\n<p><strong>Reflection and Critique<\/strong><\/p>\n\n\n\n<p><em>Limitations<\/em><\/p>\n\n\n\n<p>R has a learning curve, hence I felt limited in terms of knowledge to perform complex if, else functions. I would have also liked to plot geographic visualisation of these properties as per longitude and latitude data mentioned in the data sheet as a factor of housing cost but was limited by my knowledge of R to plot this.<\/p>\n\n\n\n<p><em>Positives<\/em><\/p>\n\n\n\n<p>R is an interesting tool to visualise data and can lead to lot of deep insights through analysis and visualisations. It has a bit of a learning curve but it\u2019s ability to perform complex operations and render visualisations can be quite powerful.<\/p>\n\n\n\n<p><em>Peer Critique and changes<\/em><\/p>\n\n\n\n<p>My final chart had some issues with the scale and was difficult to understand, I tried to use an introduced column but that didn\u2019t work hence I ended up performing a divide operation in the ggplot to fix the scale.<\/p>\n\n\n\n<p><strong>Bibliography<\/strong><\/p>\n\n\n\n<p><em>California Housing Prices<\/em>. (n.d.). Retrieved March 21, 2023, from <a href=\"https:\/\/www.kaggle.com\/datasets\/camnugent\/california-housing-prices\">https:\/\/www.kaggle.com\/datasets\/camnugent\/california-housing-prices<\/a><\/p>\n\n\n\n<p><em>RPubs &#8211; Week 7<\/em>. (n.d.). Retrieved March 21, 2023, from <a href=\"https:\/\/rpubs.com\/jladams\/week_7\">https:\/\/rpubs.com\/jladams\/week_7<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction This visualisation is an exploration of the housing prices in the state of California. The dataset gives an insight into household income, housing price , age of residents and location of the properties. The entire dataset is 20,000+ entries and is a fairly tidy dataset. The idea with this visualisation was to figure proximity&hellip;<\/p>\n","protected":false},"author":4041,"featured_media":36342,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[1],"tags":[5,1786,94],"coauthors":[1843],"class_list":["post-36339","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-visualization","tag-information-visualization","tag-r","tag-visualization"],"jetpack_featured_media_url":"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2023\/03\/Rplot06.png?fit=531%2C522&ssl=1","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/paBdcV-9s7","_links":{"self":[{"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/posts\/36339","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/users\/4041"}],"replies":[{"embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/comments?post=36339"}],"version-history":[{"count":3,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/posts\/36339\/revisions"}],"predecessor-version":[{"id":36486,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/posts\/36339\/revisions\/36486"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/media\/36342"}],"wp:attachment":[{"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/media?parent=36339"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/categories?post=36339"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/tags?post=36339"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/coauthors?post=36339"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}