Hollywood Film Composer and Producer Network Visualization


Visualization

Introduction

This project studies the network of 40 composers of film scores and 62 producers who produced a minimum of five movies in Hollywood from 1964 to 1976. According to Wikipedia, this is a time period where methods of storytelling in films changed a lot in the New Hollywood. Film directors brought new techniques from the Europe, which marked the beginning of American cinema rebounding. A new generation of films gained success at the box offices afterwards. As a moviegoer, I am interested in understanding the social network of the composers and the producers. The goal of this project was to clearly distinguish the groups of people who worked together and the central composers or producers who had a large-scale connectivity.

 

Inspiration

As this dataset analyzes the relationship between one person and another, I searched for social network visualization examples produced by Gephi to inspire my own designs. I found three visualizations that all demonstrated the network among people. The first visualization represents a social network for movie actors, and the graph shows the relationship between two actor related to how many movies they worked at together. This graph shares a huge similarity with my dataset in terms of topic. I especially like how the names of actors who were more frequently connected with others are displayed in a larger font size, like Tom Hanks. Also, actors who usually worked in a group were clustered in the same color.

Image source: https://tbgraph.wordpress.com/2017/04/01/neo4j-to-gephi/

The second visualization that inspired me was a graph demonstrating the social network of Twitter users. This graph has the similar layout as the first one, but all names were designed in the same font size, and only the sizes of nodes show the connectivity. People connected within groups are clustered with different colors. Therefore, I would definitely consider using different color classes to identify the groups that work closely.

Image source: https://ludicanalytics.wordpress.com/tag/daniel-deronda/

The third visualization depicts the social network of Facebook users. Although we cannot see the person with the most connectivity, we can still see users who connects different groups. One thing I like about this graph is that photos were imported to add the visual interests. This could be the reason that the original nodes were covered so that we cannot see the sizes of them.

Image source: http://kateto.net/2014/04/facebook-data-collection-and-photo-network-visualization-with-gephi-and-r/

Materials and Methods

The dataset for this lab was sourced from CASOS, and was downloaded as an xml file (source). The xml file was then imported in OpenRefine to retrieve the csv file with edges and the csv file with nodes. In the node csv file, there are 102 nodes which are 40 composers and 62 producers. Each person is paired with a number. In the edge csv file, there is a ‘source’ column and a ‘target’ column, each cell was filled with a number indicating a person, which forms the connection between two persons. I entered the type as ‘undirected’ for all edges. There is also a ‘weight’ column indicating how many times two persons were related together.

Once the files were cleaned up, I used Gephi to create the visualizations. The node file and the edge file were imported into Gephi. I ran a few statistics frameworks to allow the data to be analyzed properly, such as average degree, network diameter, and graph density. ForceAtlas 2 was selected as the layout of the graph. Then I set the minimum and maximum degrees based on the dataset and set the modularity resolution to get communities. Once the communities were formed, I applied color palette so that each community was assigned one color. Below is how the final graph looks like.

 

Results and Discussion

The graph created on Gephi was a very typical social network analysis visualization. The entire network was organized and grouped into modular communities. From the color clusters, we can see that the composers and producers were organized into 10 groups, and people in the same group worked with each other most frequently. Each person was presented as a node in different sizes. The larger a node size is, the more connectivity that person has with other people. These people with larger nodes also played an important role of connecting people in different groups so that the whole network was connected. From the graph, we can see that the composer Jerry Goldsmith had the largest node so that he was the composer who worked with the most producers. Lalo Schifrin and Elmer Bernstein were also composers who worked with lots of producers. Between two nodes, there was a line connecting them and the line weight indicates the frequency the producer and the composer worked on together. Thus the thicker a line is, the more times a producer had worked with a composer for the same film. It is easier to see thicker lines within a clustered color group, which could explain that people in the same group worked more often than with people outside the group. Another interesting thing to see is that there is one node isolated from the entire network on the left side of the graph. This indicates that the composer Richard LaSalle did not work with any producer from 1964 to 1976. If we looked back on the dataset, this row has a degree 0 which confirm the isolation of this node.

 

Future Directions

While creating the graph, I was wondering how I would tell if a person was a composer or a producer. However, after checking on the original dataset, this data type was not entered so that we could not know the role of the person. If possible, I would like to further dig into the data and see if I can assign the role to each person. In this way, I will be able to create a visualization that better illustrates the function of each person and how they worked with each other.