the development of Movie industry in recent decades


Charts & Graphs, Lab Reports, Visualization

Introduction

The past 2020 was a really bad year for the movie industry. Many productions got delayed and many movies postponed release dates. What’s worse, lots of cinemas were permanently closed due to the lack of audiences. Though we saw the negative impacts of COVID on the movie industry, the tremendous success of Netflix and other online streaming media makes people realized that these difficulties are only at the moment. There is no doubt that the world needs movies more than ever when the pandemic ends. Therefore, I want to analyze data of world cinema and have a better understanding of why and how movies succeed.

Inspiration and Sources

1.Information is Beautiful

The first dataset of movies that I’ve seen is the Hollywood Insider from the website “Information is Beautiful“. I have spent quite some time play with it. Though the author chose to present more financial-related data on the chart, I could still learn a lot from how they design the color and interaction.

2. Kaggle

The dataset that I used for this project is a dataset that has almost 7000 movie records that ranged from 1986 to 2016. It has 15categories which contains both quantitative and historical dimension. It is more than enough for me to conduct my research.

Here is the link to the dataset detail page.

Process and Tools

1. OpenRefine

With such a large dataset, it is necessary to go over some critical string data categories. A CSV file of the original dataset was downloaded from Kaggle.com and inputted in OpenRefine.

After reviewing several categories, I discovered that the most significant problem with string data is repetition. Here is an example. As you can see in the image, it is quite possible that Columbia pictures corporation and Columbia pictures are the same company. To make sure of this finding, I selected some movies from Columbia pictures and researched on IMDB. The result verified my guess so I combine them together using the facet function.

When I finished with OpenRefine, I export the dataset as excel file and move on to Tableau.

2.Tableau

Though I have clean the dataset on OpenRefine, I have checked and changed several table attribution before we create any sheets on Tableau. For example, I changed the year from string to date. It is also important to check if all numerical data categories are presented correctly. This is absolutely an essential process since every chart will read the attribution.

Dashboard 1:

Based on my research goal, my first dashboard is created to answer one simple question: who controls the movie industry in 3 decades.

I used four different types of charts in this dashboard, which are tree-maps, packed bubbles, bar charts, and world maps. I believed they can best present the dataset.

Though presented by the end, the Global Movie World Map is indeed the first chart that I’ve created. I’ve compared the color-coded map (choropleth) with the current dot map using my dataset. It is clear that the dot map can present the data more directly. Normally, the choropleth map paints a larger area of this map with different colors that may look fancier. However, in this situation, the accuracy of a choropleth map is greatly affected by the abnormal value, which is the movie count of the U.S.A. All the other countries will share similar colors when we apply the choropleth map. On the other hand, the dot map will provide a better visual for us to see the different values.

For the Top 20 series charts, the advantages between different types of charts are also obvious. The bar chart presents the ranking of actors in the most straightforward way. Though both top movie companies and directors present similar types of data, it is more reasonable for companies to use tree-maps. For one thing, a straight-line cut edge adds some formal feelings. This may suit the companies category well. For another, different directors have a different style, therefore, a colorful presentation is necessary. In addition, Tableau not only provides the tool to create the statical charts. By manipulating the world map, we can filter the top 20 datasets by country.

In short, with the help of Tableau, I come to a conclusion that the United States of America absolutely controls everything of the movie industry in the past decades. They produce the highest number of movies; They have the most profitable directors and actors. What’s more, all the major production companies belongs to the U.S.A.

Dashboard 2:

Due to the most strict quarantine policy, the Chinese movie industry quickly recovered from the negative impact of COVID since last summer. As a matter of fact, the box office reached a new high during the spring festival holiday. According to the Hollywood Reporter, “comedy ‘Hi, Mom’ has emerged as China’s 2021 Lunar New Year champion, riding rave word of mouth to a running total of $619 million and counting.” Therefore, I do want to know what I can find about the Chinese movie market in this dataset. Then, a second dashboard is created to see the development of the Chinese movie market.

In this dashboard, I try to compare several different datasets among mainland China, Hong Kong, and Taiwan. The first attempt I try is to divide the country by using colors and columns. Then, instead of combining different datasets together, I try to make each chart simple. In this way, the whole dashboard can be read and compare easier. Unfortunately, the sample size is relatively small, it only reflects the reality to some degree. For example, we can tell from the line chart that the budget never stops growing in mainland China. Meanwhile, it is impossible that the total gross of Taiwan is 0 million. At the same time, judging from the movie rate and gross in both China and Hong Kong movie market, the United States of America still controls the movie industry. This result is correct and also reflects what we learned from dashboard 1. However, it is also impossible that Hollywood earns nothing from the Taiwan movie market. Thus, even though these charts reflect the data well, the result is not completely reliable.

Reflections

In this lab, I have created several different types of charts and diagrams of the dataset that I choose. I’ve experienced how to use OpenRefine and Tableau to visualized and reflected information in the process. In the future, with more updated data, there could be further visualizations that focused on how the American movie industry should do to maintain their dominant position as well as how the Chinese movie market challenges the hegemony. To achieve this goal, I believe that dataset such as box office (domestic and oversea) is crucial.

As for the tools, both OpenRefine and Tableau are easy to work with at first, but there is much more we could learn. When I use OpenRefine, I think the most changeling part is to learn the coding. Thus, I’d like to know what should I learn from the beginning. Meanwhile, sometimes it’s hard to follow the logic behind Tableau. For instance, I’ve no idea why I need to manually change the budget from string to numerical. Although I think tableau try to fix these kinds of the problem by providing the exhaustive how-to videos. Overall, this is a very good way to learn both software and the whole process is enjoyable.