Introduction
So much of the focus on analyzing available art museum data revolves around the collection and the artist, which is arguably at the very crux of why art museums exist. The people (artists) create these certain objects (art found in the collection) and that in turn informs the choices institutions are making in how they are sharing their data. In other words, much of the data being shared are pivoting around those two categories. But I was very curious to understand how the “profile” of museum could be inferred by way of the museum’s exhibitions themselves. After all, this is the museums’s face to the physical public. The choices that are made here can infer their priorities and also a general reflection of historical milestones in the world of art (and society).
I researched three notable art museums, The Metropolitan Museum of Art, The Art Institute of Chicago, and Museum of Modern Art (in New York), to see what type of data they made available. The Met launched their Collection API (application programming interface) in 2017 (as part of their greater Open Access initiative), where artworks (images and data) in the public domain were made available to anyone to access them. The data portion was available in .csv file type download on GitHub or accessed directly through their API directly. The Art Institute of Chicago also has an Open API where they offer different endpoints to search, like the collection, artist, artworks, and exhibitions. However, it is not prominently displayed on the site or in the documentation(i.e. you have to dig for it).
Lastly, I looked at the Museum of Modern Art (MoMA). They do not have a public API, but they do offer their collection/artist and exhibition data as .csv file type download. The exhibition dataset piqued my interest as it was a specific effort compiled by a project team from the MoMA Archives which included: “22,000 folders of exhibition records dating from 1929 to 1989 from its registrar and curatorial departments”. Additionally interesting, this dataset contained approximately 5,900 artists “not represented in MoMA’s current permanent collection of artworks.” I decided to explore the data from MoMA for this project.
Inspiration
In researching exhibition network analysis of exhibition data for inspiration, interestingly enough I came across a project, “LinkedArt: exploring Network Analysis in Art History”, that utilized the same dataset: MoMA Exhibition from 1929 to 1989 on Github(fig. 1). The authors/researchers Sophia Renz and Vanessa Tissen took the approach of a bi-modal network to realize the timeframe, identification, and frequency of the co-exhibition of artists. Their resulting visualization demonstrated their analytic choices through color (which indicated nationality), labeling the artists that had the highest “betweenness centrality”, and setting the node size by degree (of co-exhibited artists). The choices they made with their visual choices served as a major influence for how I would consider the data.
Another network graph that served as inspiration, mostly for the delicate nature of the visualization, was that of ARTIST-venues Networks (fig. 2):
This graph centered around one particular artist, Chris Newman, and the various galleries, museums, for or non profit venues they were exhibited. I appreciated the limited, yet colorful color palette and the whitespace that held the dainty tendrils of the connections/edges. The graph itself is interactive, which is something I would like to pursue in future network projects (as time affords). It also provided a new perspective on how to communicate experiences and works of an artist beyond the confines of a CV, which help situate the idea of a network to be utilitarian in some aspect.
Lastly, I took inspiration (but did not necessarily enact in my project) from another interpretation of MoMA data. This interactive “network diagram” (fig. 3) visualized the connections between artists that were “represented in Inventing Abstraction“, a MoMA exhibition from Dec 23, 2021-Apr 15, 2013. This highlighted how art and data can co-exist very symbiotically. The way in which this more current network graph was depicted harkened back to an original diagram created by Alfred J. Barr (fig. 4) for the 1936 exhibition “Cubism and Abstract Art”, creating a self-referencing visualization loop of art and data. Ultimately, this served as inspiration for next steps (if able to extract stylistic attributes of art movements to then apply to clusters of exhibitions of artwork in the respective stylistic periods).
fig. 3
fig. 4
I ultimately decided to create a force-directed network to visualize the MoMA exhibition data to understand the relationships of the exhibitions themselves through co-exhibiting artists, specifically focusing on the dimension of gender, which was not touched upon in the examples of exhibition network visualizations I had found.
Materials | Data Preparation | Method
MoMA’s exhibition dataset was downloaded from their GitHub in the format of a .csv (comma-separated values) file. The data was comprised of approximately of 34,558 line items (exhibitions) with the following variable dimensions (however, not all variable values were populated):
ExhibitionID
ExhibitionNumber
ExhibitionTitle
ExhibitionCitationDate
ExhibitionBeginDate
ExhibitionEndDate
ExhibitionSortOrder
ExhibitionURL
ExhibitionRole
ExhibitionRoleinPressRelease
ConstituentID
ConstituentType
DisplayName
AlphaSort
FirstName
MiddleName
LastName
Suffix
Institution
Nationality
ConstituentBeginDate
ConstituentEndDate
ArtistBio
Gender
VIAFID
WikidataID
ULANID
ConstituentURL
Before being able to process and create a network graph in the open source graph visualization tool, Gephi, the data had to be cleaned in OpenRefine (which is an open source tool to clean “messy data”) and Excel to separate artist exhibitions by gender and remove blanks. Additional processing was completed to prepare the data for transformation into a network graph, this was accomplished through the programming language Python:
Essentially the Python code aggregated the data into two .csv files for each gender. One file was the node list (the unique Ids for the exhibitions = ‘ExhibitionNumber’ and their title= ‘ExhibitionTitle’) and the other was the edge list (source, target, type, and weight). The edge list provided exhibition A node (source) and exhibition B node (target) where 2 artists each exhibited, making them co-exhibited artists for that that particular pathway(edge). This was repeated for all the artists over and over again until all the pairs were accounted for. The files were then each loaded into Gephi where the software compiled the data as a node table and an edge table- which enabled statistical analysis and visualization of the instances of connections between the nodes (with the edges as lines connecting them).
Ready For Viz (and re-do’s)
At this point, all the nodes and edges spreadsheets were imported into Gephi. However, here is where I received the following warning:
The warning indicated that there were multiple exhibitions with the id = “No#”. For this scenario (in reality), MoMA had decided to identify those exhibitions with “No#”, even though they were different exhibitions. The metadata for each entry did indicate its occurrence, but the only identifier to tell them apart was the text string for the exhibition title, not the ‘ExhibitionNumber’. Despite having different artists tagged to those exhibitions, the overall yield to clean/ disambiguate/ process the data to differentiate the exhibitions with “No#” was not worth the effort ( ~338 exhibitions). So, they needed to be removed from each of the male and female edge and node files.
Time to re-do the data…
All of the data processing steps above were then completed again to account for the removal of those exhibitions. With the new sets of data files loaded into Gephi, the time came to visualize and analyze the connections (for real).
Ready For Viz…for real
For both undirected network graphs I mainly looked at the following statistics to qualify their respective main attributes and compare them: degree (the number of connections a node has), modularity (strength of connections that form communities within the network), graph density (the ratio of the edges in the network to the potential of edges that can be present), and network diameter (the greatest distance between any pair of nodes).
The statistics also informed the design choices as well:
+ Degree = node size – the greater degree, the larger the node size
+ Modularity class (communities within the network) = color. The color palette was limited to about 6 dynamic and bright colors, with the remaining defaulting to a neutral grey color. This choice enabled key comparisons with the most significant qualities of the network while backgrounding less important frequent aspects.
+ Edge colors = I chose to keep the edge colors the same as their respective nodes in order to see how the communities related to each other from a quick overview of the graph. Where the nodes were sparsely arranged, I was able to glean insights by the colored edge intersections and how they trailed from each node.
Gephi has pre-programmed layouts that call upon different algorithms to set the shape of the graph. For each graph I chose “ForceAtlas 2”, continuous algorithm where nodes repell each other and the edges attract each other.
Some additional adjustments were made to each graph through “prevent overlap” and the “gravity”/”scaling” layout options to create a balanced layout with better node visibility (this especially applied to the male artist graph).
Results
The Statistics:
Female: 737 nodes | 11,745 edges
Male: 1,484 nodes | 118,065 edges
Male Artists
The male artist co-exhibition network was a monolith(fig. 6). The overwhelming connectivity of the communities (modularity) in pink and lime green take the focus immediately. The pink node exhibitions represent a collection-wide or artwork of a certain century or stylistic time period (e.g. “Art of the Twenties” or “Painting and Sculpture from the Museum Collection”). There were certain mediums also referenced in the title: “paintings”, “sculpture”, and “prints” that repeat quite frequently. The lime green node titles tended to be related to print mediums as inferred through the exhibition titles: “Master Prints from the Collection” and “Selections from the Permanent Collection: Prints and Illustrated Books”. The orange node exhibits served as an intermediary between the concentration of connected pink and lime green nodes to the peninsulas of brown and blue extending out from the center to the left. Those orange node exhibits had a commonality of “America” or American” in the title: “Three Centuries of American Art” and “American Drawings and Watercolors: A Selection from the Collection”.
It was also interesting to discover what the peninsulas of nodes were comprised of. The blue node exhibits had a bit more variability in the title names, but had themes around “design” or “architecture”. The brown node exhibits tended to all be relating to photography/photographers: “Alfred Stieglitz Exhibition: His Collection” and “Photographs from the Museum Collection”. It was interesting to see that the lime green node exhibit tendrils hanging below the concentrated cluster portion comprised of titles that dealt with “television” and “video”: “Video Art: A History”, “Video and Ritual”, and “The Arts for Television” as example.
Female Artists:
Compared to the the network graph of the male co-exhibiting artists, the female exhibiting artists was very delicate and sparse with greater separation in the modularity class communities (fig. 7). The communities were far less connected in the female co-exhibiting artists’ network, with just a couple gateway nodes to connect the “center” to the arms of nodes and wispy edges. A couple to point out were the pink modularity class which is mostly comprised of collection-wide or artwork of a certain century or stylistic time period (just as the male exhibiting artist network demonstrated). Some of those exhibition titles included: “Three Centuries of American Art” and “Art of the Twenties”. Those exhibits served as a main gateway to the lime green node community, which all had “photography”/ “photographs” in their title name or involved female photographers: “12 Photographers and “Steichen Gallery Reinstallation”(which had notable photographers associated such as: Diane Arbus, Dorothea Lange, and Berenice Abbott to name a few). The blue node community consisted of titles where “acquisitions” occurs frequently.
An interesting similarity to note was the exhibition tendrils that also hang below in the female co-exhibiting artist network- like that of the male co-exhibiting artist network, related to “television” and “video”: “The Arts for Television” and “Selections from the Video Study Collection: 1968–1987”.
The grey and light pink clusters that shot out from the center of the graph, an anomaly that caught my eye, were exhibitions with “building”/”architecture” and “film” related words in their title. They hover in orbit around the center, anchored only by the highest degree node in the graph: “Three Centuries of American Art”.
Both network graphs demonstrated differences with their density, the gravity which attracted the nodes together, and average of degree. In other words, the male co-exhibiting artist network graph had greater occurrence of co-exhibitions for any one exhibition, even when the type of exhibition subject matter differed. There was some exception with the “peninsula” node communities, but they still had many edges that passed through more central node exhibitions. The female co-exhibiting artist network differed in that regard by a lack of central node clustering and less concentrated edges that passed through any one node.
Ultimately, I can infer the following narrative about MoMA’s exhibition choices: male artists have been co-exhibited many more times than female artists. Of the times female artists have co-exhibited, they tend to be localized to certain stylistic themes or are medium based. Not a lot of cross-exhibition of artists appear to have happened as with the male artist network, which was more tightly knit.
Reflection
In thinking about the evolution of this project, two immediate thoughts come to mind. First, I want to create an interactive network graph where the viewer can hover over the nodes to reveal the title and other data about the exhibition like date, location, medium(if available), and organizer. This current static iteration did not include labels mostly for aesthetic reasons- the labels of the exhibitions tended to be very long and obscured the visualization. The interactive graph of the artist Chris Newman mentioned above is of great influence to move in that direction. Additionally, I would like to add filtering functionality to allow for greater viewer-driven discovery.
Second, I desire to expand the context of museums in a broader capacity by incorporating other exhibition datasets via other museums’ API, data dumps on GitHub, and more interestingly, linked open data resources like Wikidata. This could provide a more global comparison of how other museums are making choices about who they exhibit (especially through the dimension of gender).
There were limitations to mention with the dataset: the total amount of women attached to the exhibitions in the 1929 to 1989 time period of the dataset was n=2,490, with about 684 unique female artists. Conversely, the male artist stats were as follows: n=23,006, with about 4,350 unique male artists. There were n=8,724 artists that did not have a gender listed and they were disregarded. It’s unclear if any effort would be dedicated to ascertain the preferred genders of the artists (in that those with missing values and/or update the gender beyond the binary). This somewhat limits an explicit representation and merely provides a general direction for the gender breakdown.
As a postscript, I have a connection to the project team that produced the MoMA exhibition data set. What is available to the public has the date range of 1929 to 1989. The data set I have been working on extends to the year 2000. I would be interested to pursue a comparison of the years 1981-1990 to 1991-2000. Especially given the influx of the ratio of female artists exhibited in the 1980’s, this could provide a better idea how the institution is making diverse decisions about gender through exhibited artists.
References
+ Sophia Renz and Vanessa Tissen. (2021, February 24). LinkedArt: Exploring network analysis in art history [Billet]. Digital Humanities Lab. https://dhlab.hypotheses.org/1867
+ GEPHI – Introduction to Network Analysis and Visualization. (n.d.). Martin Grandjean. http://www.martingrandjean.ch/gephi-introduction/
+ Inspiration: Interactive Network Graph from MoMA. (2013, April 12). https://sandrarendgen.wordpress.com/2013/04/12/defining-art/
+ MoMA | Inventing Abstraction | Connections. (n.d.). https://www.moma.org/interactives/exhibitions/2012/inventingabstraction/?page=connections
+ Possible Outcomes – Network Analysis + Digital Art History. (n.d.). https://sites.haa.pitt.edu/na-dah/possible-outcomes/
+ *Thank you to Tk Cram for .py code tips!