A t-SNE plot of objects in the CBA's collection, based on description text and subject terms. Points are colored based on the objects' manually-assigned Object Type category within the collection. Some Object Types show clear clustering, while others are widely distributed; and some clusters are clearly composed of numerous Object Types.
A cluster analysis (t-SNE plot) showing “similarity” and “difference” between objects in the CBA’s collection, based on the words used in their descriptions and subject labels. The shapes and colors of the points indicate the objects’ manually-assigned Object Types within the collection, thereby visualizing how closely objects in those categories are in fact related to each other (based on their description and subject metadata)

By Darcy Krasne

This project investigates the contours of the CBA’s Fine Art collection as it has been cataloged by successive librarians and is currently represented on the CBA’s website. In particular, it asks whether the object types assigned to individual objects offer a meaningful categorization of the collection; and taking each object’s description and subject keywords in the collection database as a textual proxy for that object, it uses TF-IDF (term frequency–inverse document frequency) to compare the objects based on their “similarity” to each other. The result is a data physicalization consisting of three main parts:

  1. A quantification of the objects in each object type category, showing where description or subject data is missing from the database;
  2. networked linkages between object types, showing how closely each is linked to the others through shared subject keywords; and
  3. visualizations that explore the collected descriptions and subjects of each object type in greater depth and show how well that object type fits the objects assigned to it.

Suggested Citation: