How were the films shoot in New York City?


Charts & Graphs, Visualization

Film Permits in New York City from April 2018 to April 2019

INTRODUCTION

New York City is the shooting location for many industries, such as Commercial, Television, and Film. In this project, I cleaned the data by Open Refine and visualized them by Tableau Public to demonstrate the category proportions, trends over one year, and also the distribution in New York City. The data is quantitive because it happens continuously and shows the frequency and also geographical with the zip-code. 

MATERIALS, SOFTWARE, AND DATASETS

Raw Data

I found the data from NYC OpenData, and it is a free public data published by New York City agencies and other partners. The data I found is about Film Permits in New York City, and permits are generally required when asserting the exclusive use of city property, like a sidewalk, a street, or a park. The categories in raw data sheet include Event ID, Event Type, Start Date Time, End Data Time, Entered On, Event Agency, Parking Held, Borough, Community Boards, Police Precincts, Category, SubCategory Name, Country, ZipCodes. In the export file, I selected data from April 2018 to April 2019 so that it includes around 1800 rows data.

Open Refine 

Open Refine is a powerful tool to clean data, transform data from one format into another, and extend it with Web services and external data. In this case, Open Refine was used to delete the useless column of the raw data sheet in order to simplify categories. After cleaning the data, I explored two data packages: Film_Permits_Category and Film_Permits_ Zip-code.

Tableau Public

Tableau Public is a tool that can allow anyone to connect to a spreadsheet or file and create interactive data visualization. In this project, I used Tableau Public to create 5 data maps: Percentage of Shooting Categories, Annual Amount of TV SubCategories, Annual TV Shooting Frequency, Annual TV Episodic Frequency and Shooting Geographic Map. 

PROCESS:

1. Browse data on NYC OpenData: In the beginning, I checked the most popular data in NYC OpenData and found the data packages about Film Permits in New York City is interested in me. It created on July 27, 2015, and lastly updated on July 6, 2019, so I considerate it could be a complete package to study how films were shot in New York City annually. 

2. Download data from NYC OpenData: The original data package has 42,982 rows data, but I filtered them in the recent one year from April 2018 to April 2019 and it only left around 1800 rows. 

3. Clean data in Open Refine: I used Open Refine to check if it is any repeated names in each category and also delete the redundant categories, such as Event ID. Also, I created a new category to integrate the Category and SubCategory into the same column by the expression: cells[“col1″].value + ” ” + cells[“col2”].value. After cleaning data, I exported data into two separate data packages: Film_Permits_ Category and Film_Permits_ Zip-code.

4. Import data in Tableau Public: In the Tableau Public, I imported the two data packages separately: In the Film_Permits_ Category, I visualized data in a hierarchy storytelling way which analyzed shooting situations from Category to SubCategory; In Film_Permits_ Zip-code, I made a geographic map based on Zip-code.

5. Integrate maps into the same dashboard: In the Film_Permits_ Category, I integrated 4 maps into the same dashboard as whole storytelling. 

6. Save files separately  

RESULTS

Film_Permits_ Category

In the Category data package, I want to explore what is the major category and SubCategory of film permits and when is the busy time for them. I created four data maps to explore these two questions in a hierarchy storytelling way: analyze the data from the category to the subcategory. 

1. What is the major category and SubCategory of film permits?

For studying the proportion of the category of film permits, I selected the Pie Chart as the format to present the result. From this pie chart, it is easy to find out that the portion of Television is far away from the other categories.

Also, I used the Bar Chart to visualize the proportion of the subcategory in each month to find out the which kind of Television takes the greatest number. From this chart, it is obvious that Television-Episodic is almost more than other subcategories in every month.

Proportion of Categories & Proportion of TV SubCategories

2. When is the busy time for film permits?

Based on Category (Subcategory) analysis, I used the Line Chart to express how the frequency of Television and Television-Episodic change with months and when is the busiest time for them. From the charts, we can see the trend of both of them are similar, and they reached the highest number in October. Also, the data in April 2019 is not completed so the declining trend at that time is unvalued. 

Frequency of TV and TV Episodic Shooting Annually

Film_Permits_ Zip-code

Which borough has most film permits?

For studying the distribution of each shooting categories on the different borough, I took Borough, Category and Zip-code as the variables for the map visualization. I also rearrange units of the chart by their amount and distribution density from top to down and also left to right. 

Film Permits Locality

DESIGN CRITIQUE & REFLECTION

In the dashboard of Film_Permits_ Category, I displayed four maps in order to demonstrate the data analysis. However, the numbers of colors are too many, and even some of the colors look similar, for example, Made for TV/mini-series and Magazine Show have the similar green colors in the subcategory proportion chart. Therefore, I think it may be unfriendly for some color blindness users. 

In addition, four of the charts represent the same story so the colors of them should be unified when they represent the same objects. For example, Television-Episodic in Bar Chart is light blue but it turns orange in the last Line Chart, and the color differences could make audiences feel confused. Meanwhile, the number of colors which is more 10 kinds overload the data expressions, so I think the small number of categories could be represented by the same gray color to reduce the visual burden and highlight the most valuable information.