American Experience Documentary Crew Network

Introduction

I wanted to visualize the connections between the cast and crew of WGBH Boston’s American Experience documentaries to explore the interconnectedness of staff on their twenty six feature-length projects from 2015 to 2017. WGBH outsources the production of their documentaries to other companies, many (but not all) of which are based in New York City. Using individual cast and crew members as nodes and shared project experience as connections, I expected to see a high degree of shared crew members from the overseeing WGBH staff, a small amount of connectivity between crew in higher positions like writer, director, and producers, and a much smaller degree of connectivity, if any at all, with researchers, voice actors, production assistants, interns, and other smaller roles. By looking at the network of cast and crew members, I hoped to get a better picture of what groups of people are or are not returning to work on American Experience productions.

Inspiration

I started with this sample visualization of a Linkedin network from Socilab.

I thought that the groupings between people who were connected on Linkedin were similar enough in concept to the groupings of people who worked together on American Experience projects. I liked that the visualization could be faceted and that it worked with a large network.

Next I looked at an even larger network in the form of this visualization of patents by Apple and Google from Periscopic Studios.

Each node here is an inventor and the edges are co-authorship credits on grants. I thought this could be even more relevant since it has more centralized hubs of creators, which I suspected would occur in my visualization. What’s more, it shows how I might compare different documentary production companies going forward.

Finally, I found this visualization from Katie Franklin, Simon Elvery and Ben Spraggon showing Star Wars character interactions over time. Because I had looked at a similar chart for the Tableau lab before realizing it required network connections, I thought that this lab on networks would be a good time to explore a similar visualization of crew teams over time. I was wrong about how Gephi worked and how complex even a basic visualization would be! However, the network concepts I gained and the data manipulation I did could be used to possibly display this through different visualization software.

Materials

I copied the cast and crew lists from all films from 2015 to 2017 listed on the American Experience website into a Google Sheet. I then transformed that data in Sheets to have six columns: Rank, Name, Credit , Air date, Film , and Production Company, with each row representing an individual cast or crew member. I cleaned the data, removing archival source credits and separating names when they were listed in the same line. I then worked only with the name columns to transform my data into an edge table; the other attributes would be incorporated later in the process as descriptive facets.

Methods

I downloaded a csv with the name columns of each production as a separate column in one sheet. I then worked in R to transform the data into an edge table. This was difficult; after much trouble-shooting it turned out that R had been reading my values as non-text strings. After discovering this it then became apparent that my transformations had created excessive and impossible edges between people who had not worked together. However, I removed all edges that appeared only once, which appeared to remove all of the impossible edges. There were impossibly high weights for other pairs, however, which meant that though my edges were correct, their weights would have to be suspect until the problem was further resolved.

I imported the edge table into Gephi, selecting the option to automatically create a nodes table from my data. I then applied the ForceAtlas2 layout algorithm to the data and watched as my teeming black node square shaped itself into a tightly connected spikey circle. Dissatisfied with the appearance of this algorithm I then used the YifanHu Multilevel layout algorithm.

Under Node>Attributes I altered the size of the nodes based on their weight, setting the maximum weight to twenty six (the number of films), and the minimum to 1. I also added a simple teal color to the edges to make the already-black nodes stand out against the very full backdrop of edges.

Results

While I have reason to mistrust my edge table and therefore the visualization itself, the initial visualization conformed to some of my expectations, indicating that I may not have been too far off in my hypothesis. There are, according to size and the initial layout done by the algorithm, three immediately noticeable groups. The first is crew that are highly connected. Though their weighted size visually implies that there are many of them, there are a smaller number of these individuals, as hypothesized. The other two groups are much less connected than the first. Visually, it is difficult to determine which of those two groups are larger. The least connected cluster is centralized around a core group of highly connected crew members, perhaps indicating that among the typically consistent WGBH staff, there are also positions that rehire frequently. The second, slightly more connected, but still relatively unconnected group clusters around the outside of the highly connected individuals; I would guess that this is the crew that is rehired semi-frequently to lower positions in documentary productions.

I think that the large circular arrangement, rather than hubs related to individual productions is a consequence of edges that should not have existed from the original data set.

Future directions

I have lots of trouble-shooting to make sure that the edge table I made in R is correct. I also need to go back to my data and clean it further; despite my best efforts, some archival sources slipped through, as well as some production houses that I would rather exclude. I may also exclude the WGBH crew members from the datasets. Because most of them are connected to every single project, it is not really necessary to see whether or not they are working with others; I know that they always are.

I would also like to add the other descriptive facets to future visualizations, perhaps coloring nodes based on the credit received for a role, and adding hover-over labels with name, credit, and films to each node. I would also like to experiment with reducing gravity to spread the tight cluster out, and then to try out different clustering algorithms in Gephi to see what patterns form, and if the clusters would form around crews that worked on the same production, more like some of the inspiration visualizations. I think that the lack of production team clusters was possibly due to my initial creation of the edge table in R and unnecessary edges being made.

Information Visualization

Student work at the School of Information, Pratt Institute

American Experience Documentary Crew Network