Looking at MoMA’s Collection

Introduction

For my final project, I wanted to work with a cultural heritage dataset specifically something related to the arts. I chose the MoMA collection, which I found on github (there are two main datasets- one of artists and one of artworks, I chose to work with the latter). I had actually downloaded the dataset last fall when I was taking LIS664- Programming for Cultural Heritage and found the previous dataset was simpler and easier to work with for this project. The current dataset available on github has additional metadata, including gender of the artist, and some information separated out such as nationality and the beginning and ending dates of works. The additional information would be valuable, but within the time I had, I felt my efforts would be better put to use with the simpler dataset that didn’t involve much tidying.

My hope was to give some insight into MoMA’s collection, some questions I had were:

How are works classified? Into what departments?
How many works are in each of those areas?
What artists have the greatest number of works in the collection?
Were there times in MoMA’s history with greater periods of collecting? Were there times when some departments collected more or had no acquisitions?

My hope is that these initial questions would be answered for a user interested in the arts or with some art background.

The dataset I used contains 123,920 rows which correlates to the number of works in the collection from 1929 to June 2015. As of today, July 5^th, MoMA updated the collection file on github two days ago- I’m certain, one could find more up-to-date information if needed/desired. My goal was more general in nature.

Process & Rationale Used to Design the Visualizations

I decided to use Tableau Public for my final project since my dataset did not include geospatial data or network analysis. I also found Tableau to be one of the most versatile programs we explored. My goal was to visualize some basic statistical information that would be difficult to cull from the csv file of 123K+ rows.

I started simply by creating a visualization that would show the number of works in each department at MoMA. I ended up consolidating two of the departments (Architecture and Design with Architecture and Design- Image Archive). The A&D Image Archive had approximately 20 works and did not seem like a necessary stand-alone category. I would have liked to have consolidated more of the departments, but wasn’t sure where or how to make that decision. I considered merging Drawings with Prints & Illustrated books. I decided against that because of the vast differences in the types of work and the way each department collected over time. In an ideal world, I would only have five categories. MacDonald recommends getting it right in black and white and then adding color (27). For that to work, I really would have needed to reduce the number of departments or maybe it would have been effective to reduce the color range.Perhaps going back to my original Tableau post and the visualization, Remove to Improve, done by Dark Horse Analytics.As it was though, this simple first bar chart illustrates something in a matter of seconds that can’t be done viewing a large csv file.

The next viz I worked on was looking at the number of works per artist. This was not as successful because many of the artists included only have one work and there are over five thousand artists in the collection. I did a quick filter of artworks by number, but I don’t feel it represents clearly what you can do. The graph doesn’t work as a stand alone (you would scroll and scroll) – you really need to use the filter bar.

The last of the three visualizations was the hardest for me to figure out how to represent over time. Because MoMA has been collecting for close to a hundred years, showing how much was collected each year seemed too tedious to me. I did create a few draft vizes showing the years to check its effectiveness..

I also have been wanting to create an intriguing area map, but that didn’t work either. I was inspired by this one which has so many categories, but somehow still works quite simply and easily.

Showing acquisitions by decade seemed to be a good solution and a way to break up time so it’s not so long across the base of the graph. Maybe this is unusual because Tableau has built in filtering with time for years, quarters, months, days, weekdays, and more. I thought about creating bins, then sets, then finally settled on creating groups for the decades. I wanted it to be clear in a snap shot what decades had high collecting in what areas. My first go at it was with bar graphs.

I’m not sure any of my attempts achieve this at a glance. I decided to look at creating a tree graph. Plaisant describes treemaps are useful for topological or attribute based queries- such as task #3 for my participants, see next section (4). The tree map comes closest and also reasserts color associations of the departments for the first bar graph.

I decided to create a dashboard so that a user could quickly get a sense of the MoMA collection and departments and then explore deeper if interested. Scaling and deciding on an order for the visualizations was the toughest part. I tried to place them in order of simplest to more complex from top to bottom. Also hoping the treemap would act as a legend of sorts and reinforce color consistency with the first bar graph.

Recruiting Participants- UX Research

“Visual communication is only effective when it is aligned with the way people see and think.”

-Stephen Few, “The Key to Dashboard Effectiveness” (15).

All of four of my users were people whom I knew had an interest in the arts. Two of my users are recent graduates from our program at Pratt; both have a strong interest in the arts – one was a fellow at the Met and the other an intern at Feigen gallery. Another user is a family friend who is a retired high school English teacher who I know has an interest in the arts and has gone to a few museums with me. My other participant has a Master’s in Art History and studio art so I was confident he would have an interest.

I created a series of tasks for each user to complete.

What department has the largest collection?
Name the top 3 artists with the largest number of works in the collection? Can you tell how many works each artist has in the collection?
What decade and what department had the greatest number of acquisitions?
Do you know what time range was the greatest for the MoMA file department with acquisitions? How many films were collected during this period?

The timing did not work out for face to face which is unfortunate as I did not have a chance to use the think aloud technique and observe usage of the mouse, any issues with the filter bar, or other issues that might have arisen with my visualizations or the software interface.

I created a post-test questionnaire for my participants to fill out to give me some general feedback about their experience.

I was able to speak with three of the four users on the phone to get additional feedback. As Patrick Neeman says, “it’s best to do it within the context of their environment.” The environment matched the users because they were at home on the computers they use daily and in a presumably comfortable and familiar setting. Ideally I would have been able to be there too. Forsell and Johnansson point out by paraphrasing Spence in asserting that “the goal of InfoVis is to provide tools and techniques that promote insight and understanding of a data set – that is to support and amplify cognition” (200). My hope was to get feedback that would help me improve the visualizations I created and for my users to gain some understanding/knowledge of the MoMA collection. Simply, I hoped they would have some insight and understanding of the data.

Findings

Color – During one of my conversations, I learned that having a legend would have been helpful. I had eliminated the legend thinking the treemap and initial bar chart were enough to reenforce the colors and categories. This same participant also pointed out that the bar chart in the center with the quick filter was confusing because she thought all of the artists were from the art and architecture category since the color matched. I had overlooked this and in retrospect thought a dark grey probably would have been a nice color choice and maybe highlighting the top three to five artists in another shade of grey or even in a red or an orange.

Naming– Having things consistently named is key. One of my participants pointed out that the language in the task did not match the language in the visualization. I asked about “works” in the collection, but the visualization says “records.” She said she was able to deduce they were the same, but that it initially threw her off.

Treemap– This may not have been the best choice. Two of the four participants said that they could not see three of the category headings on the tree graph. This reasserts the importance of user testing and testing on different machines and browsers. In the Tableau Public software, I could see all the headings. When I checked the link online, there was only one dept (Painting and Sculpture) that didn’t show the heading which I decided was not ideal, but it was acceptable.

Recommendations for Revision

I would definitely adjust the category colors for all three vizes. I would ensure the text matches with my tasks (I would use artwork or work instead of record) for the middle viz that shows the number of works in the collection by each artist. I would also like to play around with this more- create worksheets where I would filter artists by their department and see who the top three artists are (as far as number in the collection) for each department.

I would play around more with the treemap – adjust how the labels look. Also simply using one label for each category rather than having a label for each decade. Plaisant’s SmartMoney example in fig 3.4 illustrates this well (4). In Common Pitfalls in Dashboard Design, Few mentions that “…you must condense [information] in ways that don’t decrease its meaning” (15). In this same article, he cites examples that use percentages, perhaps that would be something to explore as well. Also I would consider a line graph for each dept over time. Regardless, I would like to play around more with treemap and visualizing the information for clearer meaning. Maybe doing some kind of stacked graph like this would work?

Future Thoughts

I would like to work on tidying the data of the more recent upload from MoMA on github. I would like to add some visualizations that show nationality of artists. A few things I would be curious to find out- are there places MoMA is collecting from more than others?

I would like to look at the gender of the artists as well. I found this article about gender and museum collections to be of interest and would be curious to see how MoMA’s collection fits. Professor Sula shared this link with me and I also found this viz related to gender and the MoMA collection. Additionally, I found this viz about the Tate’s collection and gender.

I also think it might be interesting to delve into a genre or department within the collection. Maybe it would be interesting to delve into the history behind the huge rise in the Prints and Illustrated books in 1964? I found this previous Pratt project that delves into the Met’s collection of Dutch painting from 1600-1800 of interest. I particularly liked how this person’s visualizations of the Tate’s collection brought so much attention to Turner works.

I would also be interested in doing some kind of network visualization with the collection that is similar to this one inspired by the Tate.

Overall, I would like to continue pursuing work with data and information visualization. The Frick has formed a Digital Art History Lab (DAHL) where they have held workshops and brought in speakers that touch on the topics we delved into during class. Learning to work with data and continuing to hone my skills with the tools, knowledge, and software learned in this class will aid me in being a better knowledge manager, information scientist, and/or librarian depending on where I land. I can see how the tools, software, and this skill-set is valuable no matter what the subject of the data.

References

Few, Stephen (2006). “Common Pitfalls in Dashboard Design” ProClarity

Forsell, Camilla & Jimmy Johansson (2010). “An Heuristic Set for Evaluation in Information Visualization” A VI ’10 , Rome, Italy. May 26–29, 2010.

MacDonald, Lindsay W. (1999). “Using Color Effectively in Computer Graphics” Computer Graphics and Applications, IEEE 19(4): 20–35

Plaisant, Catherine (2005). “Information Visualization and the Challenge of Universal Usability” in Exploring Geovisualization , eds. J. Dykes, A. M. MacEachren, M.-J. Kraak . Elsevier.