Introduction
This lab revisits an earlier project, in which I explored artists’ networks using Gephi. In that lab, I observed that open-source data visualizations might allow researchers to critique the collection and exhibition of art by artists traditionally ignored by museum collections and curators. An overwhelming number of artists in museum collections are white and male, and using crowd-sourced data to improve the availability and visibility of artists’ information may represent an external method of addressing the issue.
This project shifts the focus from visualizing available public information to analyzing what that information says about relationships within and among siloed collections. In this lab, I will attempt to visualize connections in order to challenge preconceived notions about research and scholarship.
Goals
- Discover what challenge librarians might want to solve with network visualizations
- Model connections between different communities whose information is captured in digital and museum collections
- Design network visualizations that respond critically to the state of collecting practices and scholarship
Process
UX Research
I knew that I wanted my work to facilitate the research and instruction done by art librarians, primarily in university settings. Since I already had a prototype available, I wanted to solicit user feedback before I made any further design decisions. In particular, I wanted to identify use cases where art or instructional librarians would find the network visualization helpful, what data they considered absolutely necessary to make the visualization usable, and what obstacles they would immediately anticipate to adopting the visualization at their institution.
I sent a general call for feedback to two major listservs, one affiliated with ARLIS/NA (Art Libraries Society of North America) and the other with VRA (Visual Resource Association), and I organized two user feedback sessions based on the responses.
I divided each user session into three parts. In the first part, I conducted a brief interview to get a sense of the user’s role and responsibilities at their institution. (Both were at universities.) Then, I conducted a ten-minute “think aloud,” where I let the user explore the visualization with minimal input or feedback from me. Finally, I returned to an interview-style approach, asking the user to describe use cases where the visualization might (either now or with new features) address a challenge the librarians encountered during their work.
The users repeatedly requested to have more data embedded in the attribute panes of the visualization. They emphasized their desire to see images of works, but, in other areas, observed that students and faculty, their two major user populations, would want to see different sets of information. The librarians noted that while biographical notes would help students, they would be of little use to faculty. In general, the users contextualized all of their answers with regard to the needs of students and faculty. This made it clear that I should consider students and faculty as additional stakeholders in my design. Perhaps most compellingly, one librarian shared that she could not see widespread adoption of the visualization at her institution because few, if any, art history courses included content about African American women. This observation became an important flashpoint as the project developed.
Design
The design of the final interactive network visualization resulted from a five-part process: data collection, data formatting, data modeling, graphing, and UX research.
The network data used in this project comes from Wikidata, an open repository of crowd-sourced authority information. Authority files provide “authoritative” information on people, places, or objects. That might include the standard spelling of a name, a recognized date of birth and death, and other pieces of information that disambiguate a person or place from others with similar attributes. Typically, large organizations with public missions unilaterally produce and maintain authority files. The Library of Congress and the Getty Research Institute maintain two prominent ones. On Wikidata, anyone can create or add to authorities (including links to the equivalent entry in other authority files), making it a major and expanding hub of structured data.
Wikidata provides a query service that returns data in a variety of formats, including as a downloadable CSV. The query service also acts as an endpoint for API queries when using programming languages like Python. A Wikidata query takes a pattern and outputs any matches. This style of query provides an intuitive way to extract network data.
SELECT ?source ?target
WHERE {
?source is connected to ?target
}
SELECT ?node ?nodeAttribute
WHERE{
?node is a member of the community of interest
?nodeAttribute is some additional information
}
For this project, I used Gephi, an open-source extensible graphing software package, to visualize my network data. Although Wikidata exports what is fundamentally network data, it does not do so in a way Gephi can understand at first pass. Moreover, since I expected to make several graphs, I wanted to keep the data consistent across both my node and edge tables. To solve this problem, I used Python, a general-use programming language, to format and save a list of Wikidata identifiers (formatted as wd:node) for each community of interest. When I queried for new node information or modeled edges, I only used this list of values. As I tried different queries, I added or removed information from my node tables, formatting the header information to adhere to Gephi’s guidelines. I have pasted one example of a node attribute query below.
SELECT ?node ?birth
WHERE
{ ?node wdt:P569 ?birth . }
I eventually created a node table with the following attributes: ID, name, birth date, Wikipedia article, and community tag (either Renaissance or African American).
Next, I needed to model, generate, and format edges based on the nodes I had already created. In order to relate female African American visual artists (as described in their authority records) to Renaissance artists (as described in their authority records), I had to create three models: one model relating contemporary artists to contemporary artists, one relating Renaissance artists to other Renaissance artists, and one relating the two communities to each other. Within each model, I also encountered some challenges that reflected the nature of the data.
There are two ways to model African American artists: with their collection data or without it. In the original lab, my network included collection data, as well as data about education and area of work.
Though not evident in the visualization from the prior lab, this model leaves 200 nodes unconnected and bifurcates the community into two major subgroups. To address these issues, I created a new graph, which does not use “works in the same collection” as a criterion.
If a model calls for some criteria, such as education, that a node lacks, that node cannot appear in the edge table. As a result, almost 80 more nodes appear in this graph than in the previous one. The groups also have sensible reasons for their association. The artists in purple (Elizabeth Catlett, Alma Thomas, etc.) mostly attended Howard University.
On the other hand, “works in the same collection” is the primary source of differentiation for Renaissance artists.
I combined these three edge tables into one by relating them on the basis of “works in the same collection” and area of work and brought the final editions of the node and edge table into Gephi, where I graphed my amalgamated network. The nodes were colored according to their community membership, and I applied a force-directed layout. The sigma.js plugin allows users to export their Gephi file as an interactive web page. Using GitHub pages, I uploaded the exported network visualization to the internet.
Rationale
The final graph depicts the two communities categorized by two separate colors recorded in the legend on the left. I primarily chose to include both communities on the same graph because of the user feedback. On the one hand, I could have continued to enrich the data and make the network as much of an information discovery and analysis tool as possible. This might serve the original goal, in that it might disturb or challenge collecting practices that either ignore artists or separate them from their communities of practice.
Yet, the user feedback sessions made me concerned that enriching the data would not actually accomplish the goal of empowering librarians, in particular, to challenge existing practices or develop new ones. I felt that an exploratory visualization, which would benefit from modularity and non-directive interactivity, would not accomplish my goal as much as a communicatory visualization that used the network format to challenge perceptions around instructional and data collection practices. I still wanted the communicatory narrative to open into an exploratory visualization, albeit one with a narrowed focus.
With this thinking in mind, I chose to embed a few enriched nodes, demo a narrative-driven exploration of the network, and provide open access to both the visualization and its data. I recognize that my decisions actually make the network less intuitive to explore without a specific application and a lot of guidance, so I imagined almost all users coming to the visualization by way of the demoed video. However, the narrative I developed is based on an analysis of the information this data does record: collection practices.
When I constructed the node attributes, I also chose to capture information about time in the “birth date” attributes column. I did not record birth dates for Renaissance figures because Wikidata does not record reliably record their birth dates, but birth information is available and accurate for most 19th and 20th century figures of note. Gephi can convert dates in a string format into dynamic data and use a special dynamic filter to create timelines. The sigma.js plugin cannot currently export dynamic information, but the timeline allowed me to visualize the application of collection policies on sub-communities over time. The network clearly demonstrates that collections (both museum and data collections) tend to neglect or silo 19th and late 20th century artists. More prominent artists tend to be born in the mid 20th century.
I also chose to use time attributes to build a directed graph. In this graph, the direction is determined by age, which is meant to approximate the qualities of lineage. (Even though Renaissance artists do not have their birth date recorded in their attributes, the modeling process still used age to establish directionality in this way.) Including direction in the graph allowed me to use a plugin from Gephi to analyze lineage within the network. I show the visual output of that analysis above. By combining network analysis with visual representations of time, direction, and intensity; represented by node size (in degree), edge weight, and (in Gephi only) arrowheads; I felt I could maintain the funnel shape when creating my narrative.
The selection of media for the narrative nodes was based largely on user feedback. Comparing works also makes significantly more sense than comparing identities. The nodes I selected and researched are a subset of all the lineage nodes for Mickaelene Thomas, an extremely prominent visual artist. I explicitly sought to relate her work to work by a younger artist (one of the two categories that tends to fall out of collection scopes) and to Renaissance artists also working with human figures. I intended for my research on Renaissance artists to emphasize that their work is in the public domain. The narrative mixes implicit data storytelling, which speaks to data collection practices and models of representation, with explicit art history research. As one user said of the visualization, art history has a lot to learn from storytelling through visualization.
A few other design choices are noteworthy. The network includes many unconnected nodes because the nodes represent real artists, not outliers. Ideally, the network would encourage research that eventually connects these artists to others. For the same reason, exposing the data file on GitHub, with brief instructions, empowers users to enrich nodes whether or not they have connections. On the other hand, these extra nodes add extra “data ink” to the visualization. For that reason, I attempted to minimize the visibility of the low degree nodes while filming my narrative until zooming out at the very end.
Findings
User research findings echo the structure of digital collections
The structure of digital collections effectively separates communities of practice from one another. These information siloes are reflected in the structure of authority files: Wikidata tends to record most of the information about contemporary artists in fields that do not exist for Renaissance artists. This makes sense to some degree. If the data model for contemporary artists relies heavily on their educational history, that data model will never be compatible with one that represents artists who predate the university system. Moreover, traditional education does not accurately describe the training of every contemporary artist. These are obvious limitations in the data.
However, accepting these structures as the obvious and correct ones leads to significant siloing, both within and between communities. If we think of the network as a representation of a museum or a library, it is not only that fewer than 1% of artists represented are African American women but also that the two groups of artists so rarely end up in the same room.
The user feedback suggests the situation is not much better in university classrooms when it comes to siloing. If the visualization did not represent the community of interest — to a student in their course or a faculty member in their research — it had no use. Modeling one community makes it trivial to model another, but, as long as only one community appears at a time, the structure of the digital collection will facilitate and reinforce that siloed research.
The process of creating this visualization raises the question of where data comes from. In fact, both categories (“female,” “African American”, “visual artists,” and “Renaissance,” “artistic profession”) are “arbitrary” assignments. Yet, we understand well that arbitrary decisions mask decisions driven by power. The visualization ultimately confronts the reasoning behind the arbitrary assertions about the artist’s relationships.
Intervention
The user feedback developed along two parallel tracks. On the one hand, users called for enriched interoperability of nodes, particularly by including images, subject headings, and technical areas of focus. At the same time, users reflected on the limited applicability of single-community visualizations to classroom instruction. From a design perspective, this made me suspicious about the real value of simply adding more data. In fact, changing the content of the visualization, even if it became radically more dynamic, would not address the fundamental goal of the visualization. Instead, I found that a simpler, narratively oriented design does a better job of opening the door to new kinds of thinking.
The network visualized here maps to a fictional museum. Said another way, it emulates a playlist generated from two different genres. In a sense the network has place; the nodes have a real x/y location. Unlike our typical idea of place, the network is not fixed. It draws from crowd-sourced data and it can be hacked directly by users familiar with JSON. In a sense, the final product accomplishes its goals by re-mapping the museum or creating a new map for digital collections and giving users permission to edit that map however they like — through GitHub, Wikidata, or their research.
Recommendations for Revision
Any further revision would require the collaboration of web designers, data engineers, digital storytellers, librarians, students, and faculty.
One direction for the project would be to abandon Gephi in order to work directly with sigma.js. Using Python to query Wikidata and museum APIs, a team of developers and researchers could set up a network visualization that uses this model but updates it daily. Wikidata is constantly expanding and visualizing the growth and transformation of collections across institutions over time could guide decisions around research or even calls for museum actions. A system for pulling additional data from Wikidata or museum APIs could streamline the process for enriching nodes with images and descriptive metadata, expanding the exploratory potential of the visualization. Finally, new data models or dynamic web applications might allow for the addition of more communities.
Rather than focus on a data-driven approach, future revisions might focus on storytelling. My simple video, which represents the limit of my abilities in Adobe Premier, could easily become a much richer narrative with text and voice-over. That video may transition to “scrolly-telling” or another format. The component pieces might change with help from art historians or artists in the relevant community of practice, who demonstrate other kinds of research or ways to contribute to digital collections. More documentation, a wider user community, or a library of narratives powered by the visualization might all be effective revisions of this draft.
Either way, further revisions will require more user testing with librarians, students, and faculty as the primary stakeholders. The visualization as it stands is a research playlist, but only user testing will allow us to identify how to make the intervention more intuitive and more attractive to groups with different, though not necessarily competing, priorities. It will also most likely lead to specific use cases, maybe as specific as designing for a specific art history course taught by a particular faculty member. As that user testing leads to more usable visualization, perhaps in one of the directions mentioned above, usability will not itself lead to greater implementation. Adoption and use will require outreach on the part of librarians who buy into the goals of the visualization.