Initially written November 12, 2024


As part of their contribution to our project, our team member Steph is exploring the CBA’s artists’ books collection’s subject terms through a data visualization.  I sat down with them to discuss one of the challenges to their project: transforming the subject data into something usable.  

Hi Stephanie- tell me about your project!

What is the current deliverable, what are you making?

Right now, I am exploring data visualizations- I made a Tetris-like thing. Still, I don’t have a specific layout or tool selected for how I want to do the visualization. I thought I was going to do a network; I was also thinking of doing a hierarchy in Flourish, which I’m playing with. But I don’t have a set look determined yet because I’ve just been doing data work.  It took a lot of time. 

A tree map featuring tags
An early draft of Stephanie’s hierarchy visualization, a “Tetris-like” treemap

What was the biggest challenge in your project so far?

Data type work, all the activity, and the need for precision. One of the challenges I had- for example, there’s this category “visual and verbal communications”- there were all kinds of variations of that term- uppercased, lowercase, there were tons of little things like that which collectively took a long time to work through. I also didn’t end up categorizing all of the subjects list- there were terms like “citrus fruit” or “child” that didn’t make a lot of sense for this project.  So those were examples of one-off terms which I had to determine to not be useful.

So you were removing terms which weren’t necessarily saying something meaningful about what you want to explore in the collection?

Yes, I mainly focused on printing techniques, materials or binding techniques, Which I found interesting because as I built the categories, I was comparing them to Open Artists Books (OAB); there were a few terms, in the Center for Book Arts’ collection, where I wondered “why is this here, but not in the OAB database?”  I thought about adding them, but I didn’t in the end because I thought, “who am I to do that?”.  It makes me wonder if it would be useful. If I’m not the artist, I don’t know what they did; I wonder, who am I to assume what they meant?

So you were dealing with a bit of imposter syndrome about applying metadata yourself.

Imposter syndrome about data work.

When you say you transformed (a.k.a. “cleaned” ) the data, what did that entail? I assume you were looking for duplicates of different spellings of the same words?

Even the same concept but different wording. There were a few terms like “lithography” and “lithographic print” I had to make a decision, should I just include “lithography” or should I also include the term related to printing.  I also had issues with subjects that could have two terms at the same time, like “Japanese paper” or “origami”- I ultimately went with the term used by Getty.

A section of a spreadsheet showing subject terms, categories, Getty vocabulary terms, Open Artist Book terms and definitions for each term.
A section of Stephanie’s dataset

Which Getty vocabulary were you working off of?

So you’re going into AAT and picking the broadest term for each category.  Did you do a lot of data work prior to this class?

No

So, you were doing a lot of data work in a way that you hadn’t done for prior research.  Who did you talk to about how to do it, what resources did you read?

Honestly, at first, I really didn’t know what to do. My brain was like, this is a lot, this needs to be organized, how can I organize it? I mainly based the structure of the project on ontologies, thinking, what is the broadest term each field can fit into? I decided to keep the categories as broad as possible. 

So you were working in your spreadsheet, then going to the Getty and finding each term…

There were two windows, which is why it took me three weeks.

What did you learn from this process?

Data is hard, it’s tedious, it’s finicky.  The network part I’m getting into right now- I’m not sure what it will look like at the end. But I’m thinking through how to present it to the public in a way that is understandable. It is a lot of work, especially if you’re new and don’t know how to do it.  It required patience. It never ends.

Did you learn anything interesting about the collections so far?

The printing methods, there were a lot of printing methods.  And binding too, that I didn’t know about.  I learn through seeing, so I was exploring what they looked like, sometimes through google images, because this was my first time learning about them. I also emailed Claudia (our professor) for feedback. 

When did you decide this is the cut-off?

When I was through the list of subject terms. Then I went through and added definitions for each term with links, and now I have a condensed spreadsheet, which I may have to edit further.

So what you have now is a full spreadsheet taking every subject term that is related to a book arts topic and putting it into a broader category using terms that are set by the Getty Art and Architecture Thesaurus, and the Open Artists’ Books controlled vocabulary as well. What’s next?

I used a few of the Open Artists’ Books terms; I am thinking through how those terms connect to the other Getty terms and how to visualize the project.

If you could get in a time machine, and go back and change anything about your approach to this project what would you do?

I would just tell myself what I was getting into, because I had no idea! And now look at me. I would tell myself to do a little more research about what I want to do, what the process looks like.

If you were to put yourself into the shoes of another student, maybe someone who doesn’t have as much experience with data work or groupings, what sorts of questions or thought processes did you have to work through during this stage of the process that would be helpful questions for somebody else to ask.

  • How do you want it to be organized? 
  • How do you want to organize yourself, and how do you want to organize the data? 
  • Are you going to include all of the data you’re working with, how are you going to choose what you will leave behind? 
  • How are you defining a bigger vocabulary for groupings, is it something you’re determining locally yourself, or are you using, for example, the Getty AAT?

In Closing,… additional helpful questions to ask yourself when working on a similar project:

  • Why am I working with this data- what are the research question(s) at the heart of my project?
  • What technical tools do I have access to which can streamline my data work?
  • What people do I have at my disposal for any questions as I develop my project?
  • What expertise can they offer?
  • How will my decisions about the data work (i.e., what terms to include) impact the accuracy and structure of my analysis?
  • How much time do I have to devote to this project? Do I have the support and resources necessary to work on a data-based project?

Explore the rest of the process blog:


Suggested Citation: