I selected data on film locations in San Francisco to inform my data visualization in Gephi. I’m not a movie buff but I do know San Francisco well and was interested in seeing what locations were memorialized in film. For reference, we explored the visualizations from Gephi directly and other sites. The two that stood out to me were Les Miserables network and the Shakespeare character network in creating networks through narrative(s). The Shakespeare visualization in particular effectively showed the interconnectedness (or lack thereof) throughout Shakespeare’s works by parsing multiple storylines.
Other references that served as inspiration, directly and indirectly are:
https://www.oldnyc.org/#724321f-b – This visualization connects old photographs of NYC to the mapped location. It’s a playful way to connect art, history and mapping.
https://www.washingtonpost.com/graphics/national/eclipse/?utm_term=.7e6ea5e97386 – I also enjoyed this visualization depicting the number of solar eclipses you can experience in your lifetime. It shows not only the number of eclipses you can experience over time through an interactive globe, but also the different paths these eclipses will take.
With the dataset I had, I wanted to try to display the location networks shared between films so instead of a film title as a node, and its correlating film location as its edges, my data visualization would show main or primary film locations as a node with supporting locations as edges as a way to show overlapping locations in San Francisco cinema and hopefully be able to explore the film through that: the film’s atmosphere, message and/or subject matter for example. It would also speak to the locations and how they define SF as a social or historical landmark.
I downloaded a dataset as a .csv from the SFgov site. I manually removed the following categories release year, fun facts, production Company, Distributor, Writer, and all Actor columns. This left only Title and Location Columns. I then removed blank locations instead of inputting as null in OpenRefine since there weren’t too many. I prepared the rest of the data in OpenRefine by cleaning up the remaining (trailing spaces etc..) and then reverse transposed locations from rows to columns. This created additional blank cells which were then removed. Depending on the film, or its popularity, a significant number of locations appeared multiple times. In order for this data to make sense in Gephi, I had to input it into RStudio first and define each edge. This dataset has 254 edges total which was reasonable as I could thankfully click into each line to edit and add them.
When it came time to add them to Gephi, I quickly learned that I will need to explore this program a lot more. Some issues I experienced: adding node labels to identify each main location and ability to control size of network in the overview once it’s finished running. I copied the id column to the label column to create my labels which created a dense, unreadable series of clusters. I would then need to work on how to filter most popular locations to limit the amount of labeled locations as a way to declutter this network. I used a Gephi Tutorial to help with questions as they came up (my version is slightly different so some attempts worked and other efforts didn’t). I’m most interested to see what locations and films the three outlying node clusters represent. For future exploration of this project, it would be grounding to map it properly using lat/long coordinates to see the relationship between these locations in SF.