{"id":36236,"date":"2023-02-25T11:30:16","date_gmt":"2023-02-25T16:30:16","guid":{"rendered":"https:\/\/studentwork.prattsi.org\/infovis\/?p=36236"},"modified":"2023-02-28T18:46:27","modified_gmt":"2023-02-28T23:46:27","slug":"imdbs-top-movies-a-visualization","status":"publish","type":"post","link":"https:\/\/studentwork.prattsi.org\/infovis\/labs\/imdbs-top-movies-a-visualization\/","title":{"rendered":"IMDB&#8217;s Top Movies: A Visualization"},"content":{"rendered":"<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"512\" height=\"250\" src=\"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2023\/02\/unnamed.jpg?resize=512%2C250&#038;ssl=1\" alt=\"\" class=\"wp-image-36237\" srcset=\"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2023\/02\/unnamed.jpg?w=512&amp;ssl=1 512w, https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2023\/02\/unnamed.jpg?resize=300%2C146&amp;ssl=1 300w, https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2023\/02\/unnamed.jpg?resize=369%2C180&amp;ssl=1 369w\" sizes=\"auto, (max-width: 512px) 100vw, 512px\" \/><figcaption class=\"wp-element-caption\">Image from Google Play<\/figcaption><\/figure>\n<\/div>\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Introduction + Visualization Inspiration<\/h2>\n\n\n\n<p>IMDB, or Internet Movie Database, is a database containing data about anything related to the film and entertainment industry. Specifically, it is most known for holding information and statistics about movies and the people involved (Lavery, 2017). In this lab I\u2019ll specifically be exploring the IMDB ratings of movies from the past several decades, visualizing them with other elements such as main genre and total gross earnings to have a better understanding of what successful movies looked like over the years.<\/p>\n\n\n\n<p>I was inspired by the <a href=\"https:\/\/towardsdatascience.com\/exploring-movie-data-with-interactive-visualizations-c22e8ce5f663\" target=\"_blank\" rel=\"noreferrer noopener\">visualizations from Kishan Panchal<\/a> about movies and understanding the patterns of movie profitability. He used many variations of visualizations that excellently show many key takeaways of the movie industry and its key players.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Datasets\/Tools<\/h2>\n\n\n\n<p>I collected my dataset through <a href=\"https:\/\/www.kaggle.com\/datasets\/rakkesharv\/imdb-5000-movies-multiple-genres-dataset\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>Kaggle<\/strong> titled \u201c<em>IMDb 5000+ Movies &amp; Multiple Genres Dataset\u201d<\/em><\/a><em> <\/em>and decided to use that as it had thousands of rows which would provide a lot of data. I then opened the dataset in <strong>OpenRefine<\/strong> to check if it needed to be cleaned up, which I realized I had to do as some column\u2019s cells were not numeric, and some needed to be presented with better text. Once I cleaned it up, I uploaded the dataset to <strong>Tableau<\/strong> to create my visualizations, allowing for a drag and drop function to exhibit the multitude of ways I can see the patterns and trends in the dataset.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Methods + Process<\/h2>\n\n\n\n<p>When tidying the data in OpenRefine, I first cleaned up the Director(s) column, since cells with more than one director started with \u201cDirector:\u201d. Although I didn\u2019t end up using that column in my visualizations (as it would have taken more cleaning up to split them up with some cells having close to 10 directors and leaving blank cells for some movies), I think it was good to figure out how to clean that up. When showing Professor James the dataset I had so far, he noted that my \u201cTotal_Gross\u201d column was reading as a text\/categorical column instead of numeric. With his advice, I changed it to numeric using the \u201cTransform\u201d function and used a \u201cText Facet\u201d to change the blanks to NULL through a text facet and inputting \u201cif(isBlank(value.trim()), &#8220;Not informed Value&#8221;, value)\u201d. Once the data was cleaned up the way I wanted, I exported it as an <strong>.xlsx file<\/strong>.<\/p>\n\n\n\n<p>I consulted with Nathalie before doing my visualizations, who was my partner for this lab. Initially I explained how I wanted to focus mostly on numerical factors like ratings of the data when visualizing. She gave some great suggestions on what ways to visualize some of that and also suggested doing some comparisons with my charts through tree maps. Nathalie gave some good advice when using Tableau as well since she experienced this to make sure my data was updated to the correct format, as when uploading it into Tableau, it may show one type of data as another (i.e. numeric data is shown as text). With all of this advice and going through this process, I was able to create my visualizations.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Results + Findings<\/h2>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2023\/02\/Dashboard-1-1-1024x994.png?resize=840%2C854&#038;ssl=1\" alt=\"\" class=\"wp-image-36240\" width=\"840\" height=\"854\" \/><figcaption class=\"wp-element-caption\">Figure 1: Dashboard of Visualizations <a href=\"https:\/\/public.tableau.com\/views\/ab2Visualizations\/Dashboard1?:language=en-US&amp;:display_count=n&amp;:origin=viz_share_link\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/public.tableau.com\/views\/ab2Visualizations\/Dashboard1?:language=en-US&amp;:display_count=n&amp;:origin=viz_share_link<\/a><\/figcaption><\/figure>\n<\/div>\n\n\n<p><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1. Comparison of Average Ratings and Total Gross Over Time<\/h3>\n\n\n\n<p>This comparative line chart displays the trend overtime for both average ratings and total gross, visually showing certain gaps, rises, and dips better than other visualizations.<\/p>\n\n\n\n<p class=\"has-text-align-left\"><strong>Findings:<\/strong> There is a gap in Total Gross between 1932-1934, despite ratings staying relatively the same level, most likely due to WWII. There is a severe dip in 2020, again despite ratings staying relatively the same level, most likely due to the Covid-19 pandemic.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>2. Max. Total Gross in Millions Over Time<\/strong><\/h4>\n\n\n\n<p>This tree map shows the highest earnings by year, and color codes them accordingly from highest to lowest alongside a size visualization.<\/p>\n\n\n\n<p><strong>Findings:<\/strong> 2015 had the highest grossing films. 2021 makes it as a year after the pandemic to have the third highest grossing, which I find surprising as the pandemic had a hit on the film industry.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>3. Average Ratings Highest to Lowest by Genr<\/strong>e<\/h4>\n\n\n\n<p>This tree map shows average ratings by genre in a similar fashion as the previous tree map.<\/p>\n\n\n\n<p><strong>Findings:<\/strong> The \u201cWestern\u201d genre tends to have the highest ratings, with \u201cFilm-Noir\u201d and \u201cBiography\u201d coming up close.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>4.<\/strong> <strong>Censorship Distribution Among Top Genres<\/strong><\/h4>\n\n\n\n<p>I used packed bubbles to show the most commonly seen censorship ratings within the genres.<\/p>\n\n\n\n<p><strong>Findings:<\/strong> The most common ratings are A (adult) and UA (unrestricted adult) among multiple genres, \u201cAction\u201d being the most common<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>5. Genres By Total Gross in Millions<\/strong><\/h4>\n\n\n\n<p>A pie chart was a clear way to show percentages between the genres out of the sum in total gross earnings<\/p>\n\n\n\n<p><strong>Findings:<\/strong> Action has a disproportionately high amount of earnings. My assumption is that these might be Marvel, Avatar, and Star Wars franchises.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>6. Average Runtime by Genre<\/strong><\/h4>\n\n\n\n<p>Using horizontal bars, I displayed runtimes to show which genre surpasses compared to others<\/p>\n\n\n\n<p><strong>Finding:&nbsp;<\/strong>Musicals take up the most runtime in movies, possibly because songs eat up time.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Reflections<\/h2>\n\n\n\n<p>I was mostly fascinated with the \u201cMain_Genre\u201d as a focal point for most of my graphs. It was interesting to see how I could change so easily between different variations of charts and graphs that could portray a different story regardless of having the same data. If I had the patience, I might have tried to clean up the \u201cDirector(s)\u201d and \u201cActors\u201d for more data visualization, but I\u2019m pretty happy overall with the findings I came up with.<\/p>\n\n\n\n<p>I liked using OpenRefine to tidy up my data. It took some getting used to but it\u2019s relatively simple if you know how to change certain data, but it can get complicated if you don\u2019t. With Tableau, I had fun changing up the visualizations as I figured out what I wanted to use, but it was frustrating at times that if I changed visualizations, sometimes it would remove certain data that I dragged in. I also had trouble displaying the Total Gross in Millions despite cleaning it up the way I knew, and had to make due with clarifying it in the labeling.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Sources<\/h2>\n\n\n\n<p>Lavery, T. (2017, March 22). <em>What is internet movie database (imdb)?: Definition from TechTarget<\/em>. WhatIs.com. Retrieved February 22, 2023, from <a href=\"https:\/\/www.techtarget.com\/whatis\/definition\/Internet-Movie-Database-IMDb#:~:text=The%20Internet%20Movie%20Database%20(IMDb,and%20other%20film%20industry%20professionals.\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/www.techtarget.com\/whatis\/definition\/Internet-Movie-Database-IMDb#:~:text=The%20Internet%20Movie%20Database%20(IMDb,and%20other%20film%20industry%20professionals.<\/a><\/p>\n\n\n\n<p>Panchal, K. (2018, May 13). <em>Exploring movie data with interactive visualizations<\/em>. Medium. Retrieved February 22, 2023, from <a href=\"https:\/\/towardsdatascience.com\/exploring-movie-data-with-interactive-visualizations-c22e8ce5f663\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/towardsdatascience.com\/exploring-movie-data-with-interactive-visualizations-c22e8ce5f663<\/a><\/p>\n\n\n\n<p>G, R. A. (2022, October 29). <em>IMDB 5000+ movies &amp; multiple genres dataset<\/em>. Kaggle. Retrieved February 17, 2023, from <a href=\"https:\/\/www.kaggle.com\/datasets\/rakkesharv\/imdb-5000-movies-multiple-genres-dataset\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/www.kaggle.com\/datasets\/rakkesharv\/imdb-5000-movies-multiple-genres-dataset<\/a><\/p>\n\n\n\n<p>Google. (n.d.). <em>IMDB: Movies &amp; TV shows &#8211; apps on google play<\/em>. Google. Retrieved February 22, 2023, from <a href=\"https:\/\/play.google.com\/store\/apps\/details?id=com.imdb.mobile&amp;hl=en_US&amp;gl=US\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/play.google.com\/store\/apps\/details?id=com.imdb.mobile&amp;hl=en_US&amp;gl=US<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction + Visualization Inspiration IMDB, or Internet Movie Database, is a database containing data about anything related to the film and entertainment industry. Specifically, it is most known for holding information and statistics about movies and the people involved (Lavery, 2017). In this lab I\u2019ll specifically be exploring the IMDB ratings of movies from the&hellip;<\/p>\n","protected":false},"author":4045,"featured_media":36238,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[340,149],"tags":[],"coauthors":[1847],"class_list":["post-36236","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-charts","category-labs"],"jetpack_featured_media_url":"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2023\/02\/unnamed-1.jpg?fit=512%2C250&ssl=1","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/paBdcV-9qs","_links":{"self":[{"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/posts\/36236","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/users\/4045"}],"replies":[{"embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/comments?post=36236"}],"version-history":[{"count":5,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/posts\/36236\/revisions"}],"predecessor-version":[{"id":36253,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/posts\/36236\/revisions\/36253"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/media\/36238"}],"wp:attachment":[{"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/media?parent=36236"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/categories?post=36236"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/tags?post=36236"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/coauthors?post=36236"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}