Is there a relationship between null values and gender in a museum collection’s metadata?


Visualization
Tate Modern in London, United Kingdom

INTRODUCTION

Within the art world, the conversation about gender and inequality is an active one. It predominantly centers on the strive for equal representation and recognition for women artists and artists that identify as nonbinary within museum spaces. However there may be another important concern related to this topic: equal representation within a museum collection’s metadata for women artists and artists that identify as nonbinary.

Every museum categorizes its collection digitally through metadata, a set of data that describes and gives information about other data, and this information is used by research databases or other search engines for its users to find information on a specific, related subject. It is common for metadata to have null values, but in the art world where inequalities existed based on gender, there is a greater conversation about whether null values are more likely to occur to a specific gender than others in a museum’s metadata. If so, there needs to be an exploration into the impact of this missing information and ways to rectify these information gaps.

INSPIRATION

For this lab’s content, the feminist activist artists Guerilla Girls initially inspired my exploration into these questions about metadata in relationship to a museum collection, especially around its artists’ genderized information. This activist group is devoted to fighting sexism and racism in the art world through their thought-provoking propaganda and organizing strategies. They challenge the artistic canon’s promotion of predominantly white male artists and ask the art world to become more aware of their patriarchal leanings.

As for the lab’s visualization, the New York Times’ article “The Donors Powering the Campaign of Bernie Sanders” inspired the overall design for this lab. This article’s visualizations utilize a variety of data visualization structures, quantitative comparisons, and contrasting colors to emphasize the differences between more than one category. Additionally this article doesn’t have a lot of written content and the visualizations do “most of the talking” for a reader when they consider the financial differences between the 2020 Democratic candidates, excluding Joe Biden.

MATERIALS

1. Awesome Public Datasets: Housed in Github, this is a list of a topic-centric public data sources of high quality. They are collected and tidied from blogs, answers, and user responses. There were a variety of datasets to choose, but I chose to explore the Tate Modern’s dataset about its artists. The Tate Modern is a leader in the art world for its world-class collection and ongoing pursuit to push artistic as well as cultural boundaries.

List of other museums’ datasets from the Awesome Public Datasets’ Github account

2. OpenRefine: This is a powerful tool for cleaning, transforming from one format into another, and extending it with web services and external data for messy data.

OpenRefine with my manipulated dataset

3. Tableau Public: A free platform to publicly share and explore data visualizations online. It is an easy-to-use platform, in which any person can learn how to upload and design visualization through their support materials / how-to guides.

Tableau Public’s website

PROCESS

Step #1: Choose a dataset that fit the lab’s requirements.

This dataset seamlessly provided information to explore the research questions I was interested in and it fit all the requirements listed in the lab’s instructions: 3532 rows, more than one quantitative dimension, more than one categorical dimension, and included historical data.

Step #2: Transform and manipulate the data to structure my visualizations.

Luckily this dataset was cleaned well by the Tate Modern staff, so it didn’t require too much work there. However I spent some time transforming the data through OpenRefine to isolate specific categories, such as gender in context of birth place and isolating the numeric value of the birth year from the “born in” phrase. Additionally I created several spreadsheets to filter out fields, such as years, and find the numeric ratios between genders for certain categories.

Example of manipulated spreadsheet from Tate Modern’s artist dataset

Step #3: Build out several visualizations and dashboards via Tableau. 

Using Tableau’s video tutorials, I was able to import my manipulated datasets via Excel spreadsheets and then create a few visualizations through trial and error. I focused mostly on the visualization structures that clearly represented the stark differences between the genders’ null values, such Birth versus Death Years.

Screenshot of my Tableau workspace with all the visualizations

RESULTS

As represented below, the final product for this visualization focused on contrasting color, data visualizations with triptych structures, and the visual impact of quantitative outcomes based on specific data points. Each design choice had an interpretative decision behind it:

– As mentioned above, I wanted to use contrasting colors in a similar way as the NYTimes article to visually capture the viewer’s attention and then impactfully distinguish between the three genders based on their numerical differences. Playing on the socialized cultural norms for the average American viewer, I chose blue for male, pink for female, and grey for unknown gender because these are the commonly associated colors for each gender and the solid hues brilliantly emphasize the volume differences between each one.

– A triptych structure was the guiding principle since there were three genders listed in the dataset (male, female, null/unknown) and the side-by-side representation of each gender’s numeric counts in a category seemed more effective in highlighting the stark differences.

– While the dataset didn’t have a variety of data points (only had 7 data points per artist), the true value was its volume: it has approximately 3532 artists listed and 7 data points noted for each artist. That means I could use distinct count and count to narrate the numeric differences between these genders, then highlight what these differences looked like based on a specific data point.

REFLECTION

While these visualizations are solid explorations into the functionalities of Tableau and OpenRefine, there are several areas of improvements that I would have preferred to explore in the future. The first area is lack of data diversity in the dataset—I would have preferred to find another dataset or two others to highlight other categorical differences on an artist’s gender and hopefully provide numeric context into why one gender would have missing metadata over another. The second area is better design choices—my color usage and variety in data visualization structures were good, but these visualizations seem like drafts, not a final product. I would have spent more time experimenting with the other design capabilities in Tableau, such as gradient colors, different fonts, or more statistical calculations to represent the quantitative ratios between the genders. The third area is more background context and research references to support the insights. The field of metadata has so much scholarship and various other studies have probably explored patterns about null values in relationship to gender. If I had more time, I would have read a few of these studies or articles to provide more substantial support to these hypotheses that I note in the lab’s introduction.