INTRODUCTION
For the final project for Information Visualization, I wanted to create a network visualization from a dataset I would create myself. I struggled greatly with the Gephi network visualization lab, due to some ill-timed absences, and wanted to revisit the technology, learn it better, and learn how to construct datasets for network visualizations.
The topic I chose to study was one that has been familiar to me for many years, the Whitney Museum of American Art’s Biennial exhibitions. I have worked at the Whitney for almost six years, and because of the nature of my work in the Research Resources department, I am often researching past Biennial exhibitions, and I have worked on the last three Biennials installed during my time at the Museum, the 2017, 2019, and current 2022 Biennial. Because of the unique structure of the Whitney Biennial, and its longstanding significance to both the Whitney as an institution and the broader art world, I came up with the following research questions I wanted my final project to explore:
- Is the network of exhibited Whitney Biennial artists a very interconnected network?
- Has the interconnectedness of the Whitney Museum’s network of Biennial artists changed significantly since the Museum’s move to its current downtown location in the Meatpacking District?
The first question came about because in my research of past Biennials, I often run across the same name multiple times, and sometimes even run across names of artists that have exhibited in recent Biennials. But alongside that, the Biennials also introduce me to a wealth of new emerging or previously unrecognized artists, which I feel is at the core of the Biennial’s goals. Because of these warring experiences, I wasn’t sure if the resulting network would be richly connected, or if it would be more sparsely connected and far flung.
The second question arose because there has been a big change in visitor-ship since the Whitney’s move to its current downtown location, which occurred in 2015. Previously located uptown, on Museum Mile, the Whitney heavily relied on its membership base and many visitors were local to NYC. Now, at the downtown location, tourists, especially international tourists, play a huge role in Museum visitor-ship, and while many New Yorkers still attend, the base of the visitors are tourists, and the visitors are getting younger as well. In response to these shifting demographics, and shifting institutional priorities, there is a much larger focus on emerging artists, which has been reflected in recent exhibitions. I wondered how much of that shift was perceivable by the Whitney’s history of Biennial exhibitions, or if the Biennials have consistently shown new emerging artists every year.
INSPIRATIONS AND RESEARCH
My main inspiration for pursuing this project was the knowledge that MoMA makes a lot of their collection and exhibition data publicly available for digital humanities research, a fact that I learned through my Digital Preservation & Curation course. This is something that the Whitney Museum does not do, and which I find to be very interesting.
Though these datasets are not overtly advertised by the MoMA on many channels, they are available if you know where to look, and have been extensively studied and visualized in different ways.
MoMA offers a roundup of uses of the Collection Data dataset, and the Exhibition and Staff Histories dataset has informed many publications, such as this New York Magazine article by Jerry Saltz, Where are All the Women?
Other members of this class have worked with this dataset, and seeing their work made me think about how network visualizations might be able to apply to the Whitney’s exhibition history, and more specifically, the Biennial’s history.
After determining that this would be the focus of my final project, I embarked on additional research for examples specific to network visualizations, featuring data on Museum exhibition histories and collections.
I discovered Albert-László Barabási’s exhibition at the Ludwig Museum in Budapest, Hidden Patterns, dedicated to 25 years of work by Barabási, who is a researcher and scientist who studies the development of network visualization through both 2-D and 3-D visualizations and models. For the past five years Barabási has been using network visualizations and data to study the art world, by studying “a massive number of data points about the exhibition history of half a million artists in galleries and museums worldwide that had been collected by Magnus Resch, an entrepreneur who studies the art market” (Hencz, 2021).
I also located an article titled MuseumViz – Towards Visualizing Online Museum Collections, published by Department of Computer Science and Engineering, Indian Institute of Technology Tirupati, India, which suggests network visualization of Museum collections as a tool to aid viewers of online collections in navigating the holdings of an institution.
Finally, I also came across Dr. Matthew Lincoln’s network visualization of the Smithsonian American Art Museum’s collection, using collections data scraped from the Smithsonian’s online collection. Using Gephi, Lincoln created a network that included all of the Smithsonian’s collection artworks created between 1900-1935, using keyword tags to link them. Each node represents an artwork, and each edge represents works that share five or more keyword tags (Lincoln, 2013). Color is used in the visualization to represent “communities,” through the use of modularity classes.
After conducting this research, I felt confident that my final project on the Whitney Biennials was a timely research topic, and may make a case for making even more data on the Whitney’s collection and exhibition history publicly accessible to foster additional research in the digital humanities sector.
CREATING THE DATASET
DATA SOURCE: Whitney Biennial Portal
TOOLS: OpenRefine and R
Because there was no existing network dataset for past Biennial exhibitions and exhibited Biennial artists, I knew that I would have to create the dataset from scratch for my final project. Though I am an employee at the Whitney Museum, I decided to create the dataset from only information that has been made publicly available, so that it can be more easily replicated and the core data revisited and studied by my peers. Additionally, in advance of the 2019 Biennial, the Museum’s Digital Media department created a large Biennial web feature, to better publicly record the history of the Biennial exhibition, and this web feature included artist lists for all Annual and Biennial exhibitions in the Whitney’s history, starting in 1932. These artist lists would be the basis for my dataset.
Though the Biennial feature includes exhibitions from 1932-2022, I decided to limit my dataset to exhibitions from 1973-2022, mainly because this is the time period during which the Biennial exhibition format, in which an exhibition is organized every two years (barring circumstances such as the Museum’s relocation or a global pandemic), became standardized, and thus I feel is more representative of the Biennial exhibition. Prior to 1973, there were a mix of Annuals and Biennials, and the exhibitions were often split up by medium, Contemporary Painting shown in one exhibition, and Sculptures, Drawings, and Prints, shown separately in other exhibitions. Since the current Biennials are inclusive of all artistic mediums, I felt these pre-1973 Biennials and Annuals may obscure the more relevant recent data.
Because I do not know how to scrape data using automated processes, and the artist lists were not formatted consistently across all exhibition pages from 1973-2022, I did the data collection manually. First, I copy/pasted each artist list into a GoogleSheet document with three columns, Source (which listed the exhibitions, which would become the Edges in the resulting visualization), Target (which listed the artists included in each exhibition, which would become the Nodes), and the Type, which was undirected for all. The first tricky (and time consuming) issue was that the artist lists were not just a list of artist names, but they also included additional information, such as how many works by that artist are now included in the Permanent Collection, the artist’s birth and death dates, and the artist’s birthplace. All of that information needed to be removed, so that the Target column only included a list of artist names. So, I manually deleted any data that was not an artist name.
Once I had finished this process, I had the document Final_EdgeList, which contained all the artists for the 24 Biennial exhibitions that occurred between 1973-2022, but the formatting was a bit of a mess, and it was certainly not yet ready to be transformed into a network dataset.
Next, I loaded Final_EdgeList into OpenRefine, a tool used to automate the processing and editing of large quantities of data. To prepare the data to go into R, to generate the full dataset, I needed to split the artist list across columns in each exhibition row, so that each artist name had its own cell. To do so, I used the following GREL formulas:
For cells with double line breaks: value.replace(/\s\s/,”, “)
For cells with single line breaks: value.replace(/\n/,”, “)
For artists with “and”s (in the case of multiple artists, or collectives), to isolate individual participants: value.replace(” and”,”,”)
I also needed to delete the “Source” column that listed the exhibition titles, the “Type” column, and the header row. I then exported it as a CSV, the result being this document, Final-EdgeList-forR.csv.
To transform this into the dataset, I used Professor Chris Alen Sula’s directions for Preparing Network Data in R, which resulted in the document Final-EdgeList-postR.csv.
With this as my finished edge list, I was ready to plug the data into Gephi and create my visualization.
CREATING THE VISUALIZATION
TOOLS: Gephi and Photoshop CC
After opening the dataset in Gephi, I went to the Data Laboratory tab, and copied the ID data into the Label column, so that I would be able to label my nodes with artist names. I also calculated the Average Degree, Modularity, and Graph Density. The Average Degree was 135.318, the Graph Density .078, and the Modularity .663, with 13 communities no matter the resolution tested.
After testing some options, I decided on using the Force Atlas 2 layout, as it created the clearest clustering from the data. At the recommendation of Prof. Sula, I added the Modularity Classes by color to the layout using Partition, to better segment the data, and get a better idea what was going on. I used the Generate Palette function, with a goal of getting a variety of brighter pastels that were easily distinguishable from one another, as I already knew I wanted to use a black background for my final visualization. I used the Pastel preset, and repeatedly generated color palettes until I was happy with the color variety. My earliest iteration of the design, v.1, was the result:
While this was a good start, the labels were completely illegible, and the look was not as polished as I had hoped. So, next I set the node labels to scale to proportional size, set the font to Helvetica Neue (the closest option to the Whitney’s in-house font used by our design team, which is Haas Grot Text 55 Roman), set the font to be white, and added a 5px stroke outline (opacity 70%) to be applied that reflected the color of the node’s modularity class, so that it would be clear which class the artist was grouped into. The node border was switched to white, and also set to 5px.
After making these adjustments, the labels were still far too closely clustered because the nodes were quite close together. To make it more legible, I ran the Force Atlas 2 layout again, increasing the scaling to 20, and when that still didn’t give enough space for the labels to be legible, I ran the Label Adjust briefly – not enough to eliminate all overlap, but enough to eliminate some. The result was v.3 of the visualization, and with this version I was ready to begin creating the poster in Photoshop, as my goal was to create a final product of a printed poster that included the visualization and some didactic text:
I created a 36” x 36” poster after doing some sketches on paper, deciding on this scale because I needed enough detail for the visualization’s labels to be largely legible, or at least for the larger-degree’d nodes. For all the text I decided to continue to use Helvetica Neue, to mirror the style of the Whitney’s own current branding. I applied labels to some of the more distinct clusters which represent Biennial exhibitions with a base of artists that were less-connected to the artist network, which included the 1975, 2000, 2006, 2014, 2017, and 2019 Biennial exhibitions, and color-coded those to match the modularity class of the nodes. I then drafted didactic text, setting up the premise of what a Whitney Biennial exhibition is, the span of the data for the visualization, and the size of the network.
I then decided that a detail view of the two largest modularity classes, 6 (purple, 16.58%) and 0 (chartreuse, 13.37%), to take a better look at the middle of the main visualization that seemed to represent a core of often co-exhibited artists. To do this, I went back to Gephi, and applied a Partition filter with modularity classes 0 & 6 selected, and exported this, adding it to the bottom half of the poster in Photoshop.
At this point, I realized it would be very easy to go back to my original data and extract information on which artists were most-exhibited in the Whitney Biennials, as this information was not something I could immediately glean from Gephi. To do so, I went back to Final_EdgeList in OpenRefine, performed those same three initial GREL formulas, but did not delete any columns or rows. I then split the data, using commas as the delimiter, so that each artist name was in an individual cell but all in the same column, rather than splitting the names across one row. Then, I used the OpenRefine Text Facet tool and sorted by count (highest to lowest). I copy/pasted the results of this Text Facet to a GoogleSheet here, the count of how many exhibitions each artist has appeared in follows the artist’s name, available here.
From this, I created a list of Top 8 artists who have appeared in the most Whitney Biennials, AKA, all artists who have been in 6+ Biennial exhibitions, which I added to the didactic information in my Photoshop poster, color-coding the artist names to match the modularity class of their nodes in my visualization, for ease of locating them.
I was also able to extract more information from this, which led me to draft additional didactic information for the bottom half of my poster, to accompany the detailed view of Modularity Classes 0 & 6.
The result was this working prototype, which I decided to feature for my user testing and in-class presentation:
USER TESTING
For my user testing of this project, I decided to do a usability evaluation of my poster prototype, to determine whether it stood alone as a didactic visualization for people not studying information science, but who are familiar with art, art history, and social research. The format I chose for this was a questionnaire with some interview questions. Here is my template form that was used:
Questions 1-5 were modeled off of questions from the User Experience Questionnaire, which I enjoyed because they had standardized answers (rating on a scale, I chose a scale of 1-5), that would allow me to easily average and compare answers. To supplement this format, for questions 6-9, I asked short answer questions, which would amount to an asynchronous interview for participants, to share in their own words their experience using my visualization.
I had five user testing participants: 1 practicing artist who studied fine arts and art history, 2 commercial designers who studied art history and illustration, and 2 university professors who perform social research and publish in the field of Political Science. I provided to each participant: an introductory primer to network visualizations which we studied in this class (to provide neutral contextual information to help them understand the form my visualization takes), a user questionnaire on GoogleDocs, and a PNG export of my poster prototype.
The average answer/score for my first 5 questions were as follows:
- On a scale of 1 (Easy to Learn) to 5 (Difficult to Learn), how Difficult to Learn do you find this visualization design? 2.75
- On a scale of 1 (Inefficient) to 5 (Efficient), how Efficient do you find this visualization design? 4.5
- On a scale of 1 (Confusing) to 5 (Clear), how Clear do you find this visualization design? 3.75
- On a scale of 1 (Not Interesting) to 5 (Interesting), how Interesting do you find this visualization design? 4.75
- One a scale of 1 (Conventional) to 5 (Inventive), how Inventive do you find this visualization design? 4.75
Overall, these answers reassured me that the form was interesting to viewers, leaned more clear than confusing, and felt like an efficient design. The score of 2.75 for the Easy/Difficult to Learn question was also good for me to know it took the middle of the road as an average, with some participants finding it easier, and others finding it more difficult.
For the short answer questions, participants discussed learning more about the Whitney’s history of Biennials, and often mentioned the top-exhibited artists list, and the ability to view it at a micro or macro level, by zooming in/out of the poster. The top point of critique and confusion was a lack of clarity as to what the color coding of the visualization meant or communicated.
To address these points of issue that came up across most participants, I added additional didactic text to my poster, explaining the color grouping by modularity class more explicitly, and called out why some exhibitions were labeled that appeared as distinct clusters within the larger visualization.
My in-class presentation served as the final format for user testing, where I got the great suggestion to cut down on the labels, as not all labels were legible, leading to frustration. It was recommended to only apply labels to the 52 artists that have been shown in more than 3 Whitney Biennials, allowing for the “Top 8” names to be found more easily, and ensuring that all labels that appear on the visualization are fully legible. I implemented this edit, and also included didactic text that makes it clear why some nodes are labeled while others are not.
The result of these edits is my final poster design:
FINDINGS
My main finding from this project derives from some of the statistics I included in my didactic poster text: Out of the 1,743 artists who were exhibited in Whitney Biennials between 1973-2022, only 328 artists were shown in more than one Biennial. That means only 18.82% of artists in Whitney Biennials are re-exhibited. Going further, only 52 artists have been shown in more than 3 Biennials, and only 8 artists have been shown in more than 5.
This means that by and large, the artists appearing in Whitney Biennials are new to the Whitney Biennial artist network, which is what I would expect from an exhibition format intended to exhibit contemporary art of the moment. I also think that because many of the off-shoot clusters that I could identify as distinct exhibitions and labeled in my poster occurred after the year 2000, that the Biennials are only becoming more distinct with fewer re-exhibitions over time, which does align with my personal experience, as the latest exhibitions have exhibited very young emerging artists, and a number of artists whose work was largely not exhibited or heavily “discovered” by the contemporary art market. There are outliers that are re-exhibited, but those re-exhibitions are now largely spaced out over the years, such as the example of Charles Ray, who has been in six Biennials: 1989, 1993, 1995, 1997, 2010, and the current 2022 Biennial.
A very interesting fact that I drew out during this project, is that the Whitney’s re-exhibition of an artist in many Biennial exhibitions, does not always correlate to large permanent collection holdings by that artist in the Whitney’s collection. All eight artists on the “Top 8” list are in the Whitney’s collection, but Bill Viola, the most-exhibited Biennial artist between 1973-2022, only has one artwork in the permanent collection. Going down the rest of the list, Mike Kelley has 27 WMAA collection artworks, Charles Ray has 7, Elizabeth Murray has 49, Gary Hill has 6, John Baldessari has 40, Richard Serra has 24, and Robert Gober has 49. The fact that the institutional buy-in of an artist does not always translate across exhibiting and acquiring is an interesting one, worthy of further research.
FURTHER RESEARCH
Now that I have created this final project, I am inspired to continue my research on this data, and potentially expand it to include the annuals dating back to 1932 to see if there are even more marked changes between the current Biennial format and the earliest iterations, as I suspect there are.
I intended, at an earlier stage of this project, to also get my full final network visualization hosted online using the Gephi plug-in SigmaJS, so that I could create a version of the visualization with all labels intact and legible, in case someone did want to go and look at every single name represented on the visualization. I still hope to complete this work and get a version uploaded through GitHub at a later date.
Overall, this project has inspired me to speak with my colleagues about making the data of the Whitney’s collection and exhibition history more openly available in accessible formats for digital humanities research, as some of the roadblocks I encountered in creating this dataset could have been avoided by more uniform data made available, and the Whitney’s collection could be researched in new ways if this was better supported. Moving towards a more open data approach, as MoMA has done, would be a great step by the Whitney to support and encourage this type of analysis of the Whitney’s history and collection.
SOURCES USED
Hencz, A. (2021). When Network Science Meets the Art Museum. Artland Magazine. https://magazine.artland.com/when-network-science-meets-the-art-museum/
Lincoln, M. D. (2013, November 12). Networks of the Smithsonian American Art Museum. Matthew Lincoln, PhD. https://matthewlincoln.net/2013/11/12/networks-of-the-smithsonian-american-art-museum.html
Museum of Modern Art. (2016, December 2). Exhibitions—Dataset by MoMA. Data.World. https://data.world/moma/exhibitions
Museum of Modern Art. (2022). The Museum of Modern Art (MoMA) Collection. MoMA. https://github.com/MuseumofModernArt/collection (Original work published 2015)
Romeo, F. (2015, October 27). Here’s a roundup of how people have used MoMA’s data so far. Medium. https://medium.com/@foe/here-s-a-roundup-of-how-people-have-used-our-data-so-far-80862e4ce220
Saltz, J. (2007, November 15). Where Are All the Women Artists at MoMA? New York Magazine. https://nymag.com/arts/art/features/40979/
Vagavolu, D., Venigalla, A. S. M., & Chimalakonda, S. (2021). MuseumViz—Towards Visualizing Online Museum Collections. ArXiv, 2106(11897). https://doi.org/10.48550/ARXIV.2106.11897
Whitney Museum of American Art. (n.d.). The Whitney Biennial. Whitney Museum of American Art. Retrieved May 3, 2022, from https://whitney.org/exhibitions/the-biennial