IMDB’s Top Movies: A Visualization


Charts & Graphs, Lab Reports
Image from Google Play

Introduction + Visualization Inspiration

IMDB, or Internet Movie Database, is a database containing data about anything related to the film and entertainment industry. Specifically, it is most known for holding information and statistics about movies and the people involved (Lavery, 2017). In this lab I’ll specifically be exploring the IMDB ratings of movies from the past several decades, visualizing them with other elements such as main genre and total gross earnings to have a better understanding of what successful movies looked like over the years.

I was inspired by the visualizations from Kishan Panchal about movies and understanding the patterns of movie profitability. He used many variations of visualizations that excellently show many key takeaways of the movie industry and its key players.

Datasets/Tools

I collected my dataset through Kaggle titled “IMDb 5000+ Movies & Multiple Genres Dataset” and decided to use that as it had thousands of rows which would provide a lot of data. I then opened the dataset in OpenRefine to check if it needed to be cleaned up, which I realized I had to do as some column’s cells were not numeric, and some needed to be presented with better text. Once I cleaned it up, I uploaded the dataset to Tableau to create my visualizations, allowing for a drag and drop function to exhibit the multitude of ways I can see the patterns and trends in the dataset.

Methods + Process

When tidying the data in OpenRefine, I first cleaned up the Director(s) column, since cells with more than one director started with “Director:”. Although I didn’t end up using that column in my visualizations (as it would have taken more cleaning up to split them up with some cells having close to 10 directors and leaving blank cells for some movies), I think it was good to figure out how to clean that up. When showing Professor James the dataset I had so far, he noted that my “Total_Gross” column was reading as a text/categorical column instead of numeric. With his advice, I changed it to numeric using the “Transform” function and used a “Text Facet” to change the blanks to NULL through a text facet and inputting “if(isBlank(value.trim()), “Not informed Value”, value)”. Once the data was cleaned up the way I wanted, I exported it as an .xlsx file.

I consulted with Nathalie before doing my visualizations, who was my partner for this lab. Initially I explained how I wanted to focus mostly on numerical factors like ratings of the data when visualizing. She gave some great suggestions on what ways to visualize some of that and also suggested doing some comparisons with my charts through tree maps. Nathalie gave some good advice when using Tableau as well since she experienced this to make sure my data was updated to the correct format, as when uploading it into Tableau, it may show one type of data as another (i.e. numeric data is shown as text). With all of this advice and going through this process, I was able to create my visualizations.

Results + Findings

1. Comparison of Average Ratings and Total Gross Over Time

This comparative line chart displays the trend overtime for both average ratings and total gross, visually showing certain gaps, rises, and dips better than other visualizations.

Findings: There is a gap in Total Gross between 1932-1934, despite ratings staying relatively the same level, most likely due to WWII. There is a severe dip in 2020, again despite ratings staying relatively the same level, most likely due to the Covid-19 pandemic.

2. Max. Total Gross in Millions Over Time

This tree map shows the highest earnings by year, and color codes them accordingly from highest to lowest alongside a size visualization.

Findings: 2015 had the highest grossing films. 2021 makes it as a year after the pandemic to have the third highest grossing, which I find surprising as the pandemic had a hit on the film industry.

3. Average Ratings Highest to Lowest by Genre

This tree map shows average ratings by genre in a similar fashion as the previous tree map.

Findings: The “Western” genre tends to have the highest ratings, with “Film-Noir” and “Biography” coming up close.

4. Censorship Distribution Among Top Genres

I used packed bubbles to show the most commonly seen censorship ratings within the genres.

Findings: The most common ratings are A (adult) and UA (unrestricted adult) among multiple genres, “Action” being the most common

5. Genres By Total Gross in Millions

A pie chart was a clear way to show percentages between the genres out of the sum in total gross earnings

Findings: Action has a disproportionately high amount of earnings. My assumption is that these might be Marvel, Avatar, and Star Wars franchises.

6. Average Runtime by Genre

Using horizontal bars, I displayed runtimes to show which genre surpasses compared to others

Finding: Musicals take up the most runtime in movies, possibly because songs eat up time.

Reflections

I was mostly fascinated with the “Main_Genre” as a focal point for most of my graphs. It was interesting to see how I could change so easily between different variations of charts and graphs that could portray a different story regardless of having the same data. If I had the patience, I might have tried to clean up the “Director(s)” and “Actors” for more data visualization, but I’m pretty happy overall with the findings I came up with.

I liked using OpenRefine to tidy up my data. It took some getting used to but it’s relatively simple if you know how to change certain data, but it can get complicated if you don’t. With Tableau, I had fun changing up the visualizations as I figured out what I wanted to use, but it was frustrating at times that if I changed visualizations, sometimes it would remove certain data that I dragged in. I also had trouble displaying the Total Gross in Millions despite cleaning it up the way I knew, and had to make due with clarifying it in the labeling.

Sources

Lavery, T. (2017, March 22). What is internet movie database (imdb)?: Definition from TechTarget. WhatIs.com. Retrieved February 22, 2023, from https://www.techtarget.com/whatis/definition/Internet-Movie-Database-IMDb#:~:text=The%20Internet%20Movie%20Database%20(IMDb,and%20other%20film%20industry%20professionals.

Panchal, K. (2018, May 13). Exploring movie data with interactive visualizations. Medium. Retrieved February 22, 2023, from https://towardsdatascience.com/exploring-movie-data-with-interactive-visualizations-c22e8ce5f663

G, R. A. (2022, October 29). IMDB 5000+ movies & multiple genres dataset. Kaggle. Retrieved February 17, 2023, from https://www.kaggle.com/datasets/rakkesharv/imdb-5000-movies-multiple-genres-dataset

Google. (n.d.). IMDB: Movies & TV shows – apps on google play. Google. Retrieved February 22, 2023, from https://play.google.com/store/apps/details?id=com.imdb.mobile&hl=en_US&gl=US