Netflix Movies Data Visualizations


Visualization

INTRODUCTION

I have always had a deep interest and love for filmmaking. Going to the movies is something I enjoy and I love the idea of bringing a group of strangers into a room and connecting them through the same experience. Movies allow us to travel to different worlds and explore emotions that on a daily basis we do not, they make us dream and learn new things. 

When Netflix started to become more popular some years ago I must admit I was not sold on the idea. As I previously said I love going to the movies and seeing them at a movie theater. However, in 2020 when the pandemic started and we had to quarantine Netflix became a source of fun, distraction, and somehow an escape for what the world was going through. I do see now the benefits that it has, more importantly it allows users to rewatch movies that they love. 

While using Netflix there is also a moment of frustration as users can feel saturated and overwhelmed with everything that the platform has to offer, myself included amongst those users. Due to the platform’s algorithm, many times users are not exposed to new genres, actors or movies from different countries.

While thinking of a project to do for this class, I started to think on everything that sparked curiosity, and Netflix certainly does. I decided to work with a data set based on Netflix’s content.

My goals for the project were to be able to break down Netflix content and understand the size of what is available to users, compared to what they usually are exposed to when accessing the platform. To learn how the size of the movie industry changed in recent years due to the popularity of platforms like Netflix, and what are users’ preferences.

TOOLS

  1. Kaggle – Acquired data sets from this website. Downloaded it for free.
  2. Excel – Software used to access and clean the data sets selected.
  3. Tableau – Software used to create visualizations with the data sets.
  4. Tableau Public – Platform used to publish projects so users could access it.
  5. Zoom – Program used to do user testing and be able to observe users as they walk through the project.

DATA

Data set files were acquired via the website Kaggle. It consisted of two files; 1) all Netflix movies and shows & 2) all the actors that appear on those movies and shows (breakdown by show/movie). As the file was too large, I decided to only focus on the Movies rather than including shows too. By doing this, I reduced the size of the file and it became easier for my computer to manage.

In order to have only the information I wanted, I had to do some data cleaning on Excel. It consisted of deleting all shows and fixing grammatical errors. 

As mentioned above, data consisted of two files, but unfortunately data sets did not have a connecting point in order for me to use both of them in Tableau. I did a vlookup on Excel that allowed me to create a connecting point between both files. I used Movide ID# to connect them. 

Data set for movie titles consisted of all the movies available in netflix, release year, genres,  movie title, movie ID, average score of movie, and # of votes per movie. Data set for actors consisted of a list of all the actors that appeared in each movie, along with the character they played. This list also included directors.

DESIGN DECISION

My initial intent for the project was to create visualizations using the software Gephi. I wanted to focus on the actors rather than the movies, and create visualizations of where the actors intercept. As I started to use Gephi, I came across some issues that discouraged me to keep moving forward with it so I had to re-evaluate the purpose of my project. I decided to move to Tableau instead, where I would be creating mainly graphs where users could easily visualize and interact with.

CREATING VISUALIZATIONS

What I wanted to achieve was for users to easily visualize Netflix’s data and be able to navigate the interactive visualizations without complications. I wanted the graphs to be simple and clear. 

I uploaded both of the data sets I wanted to work with to Tableau, and I started to apply the appropriate filters. I had some clear vision of some of the information I wanted to represent in graphs like the total number of movies and the breakdown of the movies per year, however I was not sure how to present them.

Graph # 1 – Movies Released By Decade

The first challenge I encountered was when grouping together the total movies per decade. Initially it made sense to use the feature “Grouping”, however information was not being presented in a clear format. So I did some research and proceeded to create calculated fields in order to group decades. This allowed me to easily group movies by decades so it helped to better visualize the total movies, rather than only having total movies per individual year. 

Users are able to click on each decade to view more details, this will cause the rest of the graphs to adapt to the selection and only show information relevant to the selected decade. 

Figure 1.1 – Graph of movies released per decade and breakdown of calculated fields used

Netflix has movies from 7 different decades, ranging from 1950 to 2022. The decade that has the most movies is 2010-2019, it has a total of 2205 movies. However the decade starting in 2020 has managed to have a total of 1062 movies as of today. So this means that by 2029 this decade will be by far the one with the most movies released. This is due to the fact that movies are no longer only being released at movie theaters but also streaming companies like Netflix have created their own production companies, or even buy the movies from smaller studios to be released as exclusives in the platform.

Graph # 2 – Movies Released By Year

As I had already created a graph where movies could be seen by decades, I wanted to create one that showed movie releases by year to make it more specific. This bar graph consisted of 66 lines, as some years like 1957 do not have any movie available in Netflix’s database. 

Users are able to click on each year to view more details, this will cause the rest of the graphs to adapt to the selection and only show information relevant to the selected year. 

Figure 1.2 – Graph of movies released per year

Graph # 3 – Total Viewer Votes

This graph consists of the total votes received per movie. 

Netflix has removed this feature from their platform but before users were able to give a score to the movie watched. However, this can be deceiving as the total votes recorded is not an actual reflection of the total times the movie has been watched, but only the times that a score was given. Viewers were not forced to provide a score. 

This graph is sorted from the highest ranked to the lowest. The highest being “Inception” from 2010 with a total of 2,268,288 votes. It was not possible to determine one least voted one as there are plenty that no votes were submitted.

Figure 1.3 – Graph of total votes submitted by Netflix users

Graph # 4 – Movie Scores

This graph reflects the Movie Score given based on the votes received by viewers. The score is average from what was submitted in the votes and should fall between 0-10. 

I did encounter an issue when two movies had the same name, for example when remakes occurred. Instead of providing different lines, the score of both movies was being added given a total score of 15.1. I had to add release year as a measurement in order to have the rows splitted. 

Users are able to click on each movie score to view more details, this will cause the rest of the graphs to adapt to the selection and only show information relevant to the selected movie such as release year, number of votes and genre.

Figure 1.4 – Graph of average Movie Score based on votes submitted by users

Graph # 5 – Movies by Genres

The graph for genres was the one that I found the most complex to elaborate

Initially I discarded doing this data set in Tableau as the way the genres are presented would not allow me to properly sort them. Not only one genre is assigned per movie but all of the ones that apply, for example a movie can be Comedy, Romance and European. 

The initial idea I had was to create bar graphs but it was not possible due to the data organization in the column E “Genres”. I later learned that I could have splitter the genres through a formula to split information.

Figure 1.5 – Screenshot of part of original data set used to create data visualizations. Image is to point out how “Genres” where organized.

For the past couple of weeks I had been doing some research on vacuums to buy one fo my personal use. There are so many models out in the market that I became fond of the “compare model” feature on many websites. This would allow me to understand the features that each model had and anything relevant to know about it.

Seeing the “compare mode” charts gave me the idea of doing something similar with the genres of the movies as I could not break them down. I started researching and was able learn about the calculated fields, so I proceeded to create a calculated field per genre. 

The first step I took was to clean the data on excel for genre and get all the possible genre options. Then I proceeded to tableau to create the 18 calculated fields so the graph could properly display all the genres that each movie was classificated under.

Figure 1.6 – Calculated fields created to be used in Genres graph


The final result was what I expected; a graph where I could see the genres per movie in a clear way.

Figure 1.7 – Final outcome of Genres graph.

Graph # 6 – Actors per Movie

For this graph I used the data corresponding to the actors file.

The original idea for this graph was to make it on gephi and have all the actors overlapping visualized, however it was not possible. I proceeded to do a graph in Tableau that showed the actors per movie. It was still an interactive graph as when moving into the second dashboard, actors could be filtered pero decade or year and only the relevant movie actors would show up.

Figure 1.8 – Graph of actors per movie

UX RESEARCH

In order to understand if what I was visualizing to do with the data was the best option I had a user testing with two avid streaming platform users. I wanted to first have a conversation with them  before I actually asked them to do the user test on the dashboard I had created. I conducted user testing that consisted of two sessions. 

The first session consisted of me explaining what my data was about and what I had planned to do with it. I presented the ideas I had to create visualizations such as divide movies by genres, do a graph of the most popular ones, and a total count of all the movies. As both of them are avid users of Netflix I wanted to understand where their curiosities lied in regards to the platform’s content. 

The second session consisted of both users doing a user test on the dashboard created. One of the users had the session over zoom and the other one in person. With this user testing what I wanted to achieve was:

FINDINGS

The main findings from conducting the user testing are: 

  1. Actors graph can be deleted. 

The recommendation from both participants provided is that actors graph is not needed. Information is not truly presented in a way that can be informative as many of the actors are not quite known. Also, I was recommended to go deeper into the data cleaning as there are some typos or symbols that should be fixed / removed from the file so they are not brought up to the graph. 

  1. Votes Graph organized in descending order

In this graph, votes are organized in descending order, however they originally were by year. It stared out with the year that has the highest amount of votes, this being 2020. Then proceeds to move to 2007 as this is the second year with the largest number of votes. This made it confusing for users to clearly realize that the first movies that they are seeing are not really the ones with the larges amount of votes. 

After the user testing I tried to fix this, and was able to make the appropriate changes. instead of having movies arrange in defending order only by year, I made it into a raking from all movies:

Figure 1.9 – Comparison of original Votes graph and updated version. Reflecting changes made; new version has “Top 10”
  1. Score Graph organized in descending order

The same situation occurred here than in the votes graph. Movies are organized in descending order but within each year rather than having a general score ranking. 

This makes users confused as years jump around very drastically, for example, current number 1 is 1979 and second one is 2011. When seeing this in the graph format it does not make sense until analyzed with more detail.

I was able to make appropriate changes to the file and present the top 10 movies with eh best score:

Figure 2.0 – Comparison of original Movie Scores graph and updated version. Reflecting changes made; new version has total “Top 10”

FUTURE STEPS

Working on this project was quite enjoyable. I got to develop my skills using Tableau, and challenged myself to develop more advanced graphs that the ones I had previously done, therefore I had to learn new actions to apply in the program. Prior to this class I had never used this software, so to see that I got a good rasp of the program is an accomplishment I take pride in.

How I would improve this project would be to spend more time in the actors file as I would like to explore more graphs with this information. I was not able to achieve the version of the graph I was envisioning this to be presented as.

I would also like to explore more different options to do in the genres file. I did like what I achieved, however the graph is very long so would like to figure out any other way to present this in a more simplified way.

Overall I liked the outcome of the project, it reflected what I initially had in mind to do and I am very happy with the interactions I was able to apply to the graphs. 

Figure 2.1 – Final version of Tableau Dashboard

Link to Tableau Dashboard: https://public.tableau.com/app/profile/maria.menendez4999/viz/NetflixMoviesBreakdown-Final/NETFLIX?publish=yes