Exploring relationship between imdb ratings and streaming service films


Charts & Graphs, Lab Reports

This lab uses Tableau to tell a data visualization story.

Introduction and Inspiration

More than ten years ago television reached its peak. The year was 2010 and cable providers were seeking more and more ways to sell bundled packages and raise prices. Little did they know, the industry was about to change as Netflix, introduced in 2007, provided consumers with an alternative. One low monthly rate, no commercials, and unlimited access to hundreds of movies and tv shows. 

The year 2020 and the surrounding pandemic caused streaming service rates to rise. Netflix alone doubled its expected number of new subscribers in the first three months alone. Despite this, with the pandemic subsiding and individuals looking for ways to regain a semblance of normalcy, cancellations have risen as consumers have been faced with rising prices and questions around which content provider is worth shelling out the monthly bill. 

With this in mind, streaming services have been forced to make decisions over the type of content that will entice individuals to remain subscribers. Therefore, the following process and dashboard provide a glimpse into the types of content that result in the highest ratings from the popular review site IMDb. 


Methods, Processes, and Materials 

Before starting this process, a data set that supported this investigation needed to be found. For this, the website Kaggle, a crowdsourced platform for hosting datasets and challenges, was consulted. The data chosen provided a list of 9,515 movies hosted across the streaming services Netflix, Hulu, Amazon Prime, and Disney+. In addition, each entry provided data for the movie’s year, country in which it was produced, IMDb rating, Rotten Tomatoes score, and target age group. 

In order to start formulating questions and interpreting the data, other data visualizations were consulted in order to identify which types of interpretations people might be interested in and familiar with.

One thing that really interested me between these visualizations was the data pertaining to genres that each of the streaming services cater to. Once this was observed, I made a point to investigate this within my own dataset. However, one thing I didn’t think these things effectively communicated was the relationship between these genres and audience popularity. 


The Process

  1. Cleaning the Data

The first challenge I faced with this data set was the way the streaming service data was implemented. Instead of having one column with the streaming service provider, the data had a column per streaming service and used a 0 or 1 to communicate whether or not the movie is available on the particular service. In order to change this so streaming services could be compared the data was input into OpenRefine, an application for data cleanup. 

  1. Interpreting the Data

Once the data was clean, it was taken into Tableau where some preliminary data visualizations were created. During this process, I began to understand how easy it is to create visualizations that support false claims. It was a challenge at first to tackle this problem, especially when dealing with quantitative data. Tableau often summed this data (for instance summarizing all IMDb ratings) which created visualizations which didn’t correctly represent the data. 

  1. Creating a Story

After learning Tableau more, it became easier to create visualizations that communicated the story I wanted to tell. My first breakthrough came once I learned when to visualize qualitative data as a dimension and when to collect the average of the data. During the process, I came up with 5 unique visualizations, each teaching me a different facet of Tableau and how to create effective visualizations. 

  1. Testing the Dashboard

Once all 5 visualizations were created, I decided to test the dashboard with a user in order to see which improvements could be made. From this round of testing, I observed that the user had confusion with some of my titles and needed more explanation for some of the visualizations. In order to respond to this, I changed some title names and added captions to provide further insights to the visualizations that needed further clarification.

Tableau Dashboard


Results & Interpretation

These visualizations each explore a different aspect which might influence an IMDb rating to be higher or lower. These different aspects are genre, country where the movie was produced, the runtime, director, and the average IMDb rating of all movies on particular streaming services. Each of these visualizations use a similar palette as the relationship between IMDb rating and color is linked in 3 of the 5 visualizations. At first, the color palette used was a red/green diverging, but I then decided to go with a color-blind friendly palette where red represented low IMDb scores and blue represented higher ones.

The visualization that I feel communicates the most information is the one titled “Average IMDb Rating by Genre”. In this visualization, I took a look at which movie genres are present on which streaming services and then colored these genres based on their average IMDb ratings. Thus, I was able to create a visualization that allows users to make insights on which streaming services have the most popular films for each genre, which are least/most popular on average, and which are missing from streaming services. 

A visualization that uses a similar process is titled “IMDb Rating of Movies Produced by Country”. However, due to information density, this visualization requires a bit more interaction between the user and the graph. The user is able to scroll over the visualization in order to see which countries produce films that have higher or lower ratings.


Reflections 

Although interesting insights were identified, my original goal of creating a dashboard that helps users identify the differences in content between streaming services was more difficult than I originally thought with this dataset. Transforming the data in OpenRefine was a step in this direction but further work would need to be conducted in order to fully realize this goal. 

In addition, conducting one user test made me realize its value and in the future I would like to set aside time to test my dashboards with more users in order to gather more informed insights.