This final project builds upon my work as a Graduate Assistant (GA) for the Digital Preservation Outreach and Education Network (DPOE-N), hosted by Pratt Institute and NYU-MIAP. DPOE-N received a 2-year Andrew W. Mellon Foundation grant to provide professional development funding for cultural heritage professionals (located in the United States and Territories) to attend workshops and training focused on digital preservation. In response to the ongoing COVID-19 pandemic, DPOE-N is also offering emergency funds for small archives, libraries, and museums to purchase external hard drives to better preserve at-risk digital collections.
Earlier in the semester, I visualized DPOE-N data generated by 6 graduate students focusing on areas of institutional research, outreach, and community engagement. The data is composed of records outlining organization name, institution type, location, and contact information, tracking who we have contacted directly to promote the DPOE-N initiative. In the initial lab, my goal was to evaluate locations we have and have not conducted research and outreach in. While creating a treemap and stacked bar graph by organization types, I discovered that we needed to reevaluate how we categorized organization types in our workflow. The data revealed inconsistent tagging, sometimes being very broad such as “Library” other times very granular such as “Newspaper Archive”. It was also difficult to normalize data from the multi-valued “Organization Type” field and create a visualization that represented duplicate category types for each record.
The goal of this project is to address the data challenge outlined above, update the 517 records in our Airtable database, and revise the treemap and stacked bar graph to better evaluate DPOE-N’s outreach efforts and focus. I worked with the DPOE-N team to conduct a usability study and review to revise the current 41 organization category types into roughly 10 high-level categories. The new categorizations were added to the current records and amended to our institution research and outreach workflow. This collaborative effort will realize graphics that better represent our work to date as well as allow us to analyze gaps in our research. I also hope this project will streamline the next year of work and create a foundation for the final impact report in 2022 at the conclusion of the Andrew W. Mellon Foundation funded grant.
Process + Methodology
The process began by assessing the DPOE-N data, an Airtable sheet containing a list of organizations and institutions that DPOE-N has reached out to directly. During this assessment, I discovered that we were using 41 different category types to tag organizations and institutions.
In order to narrow down the 41 categories (figure 2) into 10 or fewer high-level categories, I decided to use a Miro board to create a collaborative affinity map, the practice of organizing ideas into groups based on perceived relationships or analysis. I imported the 41 categories into the Miro board as virtual sticky notes and began grouping similar categories by theme and potential types, such as Academic, Archives, and Museums (figure 3).
I referenced surveys conducted by cultural serving organizations such as the Web Archiving Survey conducted by NYARC and the Society of American Archivists 2015 Employment Survey to gain a broader understanding of how other organizations categorize institution type. These surveys helped inform the draft of high-level categories that were identified and organized into a table to conduct a usability study with the DPOE-N team, composed of 4 other GA’s and the DPOE-N Program Coordinator (figure 4). I proposed 11 categories to the DPOE-N team (highlighted in light blue in the left column). Together we further brainstormed types of categories and analyzed which categories were used the most and discussed the advantages and disadvantages of tagging an institution with multiple category types of varying granularity. In the end, we decided to revise the initial 11 into 8 broader categories represented in purple.
This decision was informed based on DPOE-N’s grant proposal which aims to “create institutional impact,” rather than impact on a certain repository type. Therefore categories in the proposed 11 which subdivided, such as Academic Libraries from Academic Archives, were combined into College/University.
Applying Refined Categories in Airtable
After conducting the usability group discussion, I began applying the 7 categories in a new single select column in our collaborative Airtable sheet. The ‘Other’ category was a placeholder to be addressed if needed. During this process, I realized that we needed to create another category that encompassed community-oriented cultural centers that did not fit in with historical societies but weren’t formal colleges or universities. This resulted in adding an 8th category: Cultural Center (figure 5). This demonstrated the iterative nature of trying to refine our data, apply new category types, and reassess our categorizations based on application.
It’s important to note that as a result of our team meeting and user study, we decided to keep both columns, and add the single-select column input to the research team workflow. This compromise was met so the research team could continue utilizing the multi-select column to help streamline and track their research efforts.
The newly revised data was then exported from Airtable as a .CSV file and imported into Tableau.
Once the new data set was imported into Tableau, I created a treemap based on the category and category measure. The horizontal stacked bar graph is based on location (state) and color-coded with the newly created organization and institution categories (figure 7). Thus visualizing both the total number of places reached out to in a state as well as the breakdown of type.
When deciding what color palette to use, I began by referencing the four colors used in the DPOE-N logo (figure 8). I selected orange as the base color for the Treemap. The color saturation was based on the measure (sum) of organizations and institutions that fell into a category. The most saturated and largest category being Museums and the least saturated and smallest being Religious Organizations.
For the stacked bar graph I selected 8 colors to represent each institution category type. I specifically chose to omit orange to avoid creating unintended color relationships between the two graphs. Since each category is a distinct measure, I used a qualitative palette and chose colors that would be distinguishable from each other. Lastly, I checked the colors in Viz-Palette by Elijah Meeks and Susie Lu to make sure the colors would account for color-vision deficiency (figure 9).
Once the two graphs were created, I added them to a Tableau dashboard and began exploring how to best layout the two graphs side by side. I considered visual hierarchy and natural eye patterns to inform the final layout. The title and treemap were placed on the left and the stacked bar graph to the right. Lastly, the two graphs were also linked via an organization type filter so users can select a category/color from either graph or legend to highlight and compare that category between the two graphs.
This project, beginning with the first iteration (figure 10) demonstrated how some visualization projects are more about data integrity and refinement than the actual process of creating the end visualization itself. The majority of this project was focused on creating an affinity map and working with the DPOE-N team to distill our 41 categories down to 8, then 9. The UX stage of this project gave me insight into a real-world scenario and the challenges to arriving at group consensus. It also demonstrated the iterative nature of distilling categories and highlighted how this process will continue to evolve over the course of the grant as new institution types are added and grant reporting needs arise.
Applying the 9 high-level categories revealed how imperfect categorization is. Many cultural institutions serve more than one purpose and can be a library and a community archive. It also emphasized a major limitation: subjectivity. I recognize how assigning categories are inherently subjective and that the graphs may look different if another GA assigned them. I tried to maintain a hierarchical approach where the highest level of the affiliated institution type determined the categorization, but cultural data does not always fit into defined boxes and labels.
This process has given me valuable insight into the process behind trying to quantify cultural heritage data. I have also gained a critical eye and the awareness to question ways data is manipulated and applied to create visual representations.
Though the base data may be imperfect, in comparison to the original version (figure 10), I believe this revision resulted in a more cohesive visualization that better represents the organization and institution types we have or have not reached out to based on location. The revised layout was created with more attention to interactive design best practices such as visual hierarchy. In this version, I can now observe that we have only contacted Colleges or Universities in Mississippi. While the one institution contacted in Utah and Delaware are museums.
After sharing the updated visualization with the Information Visualization course, Professor Sula made a great suggestion to color-coding the treemap (colored in orange) with the same category colors used in the stacked bar! This would also turn the treemap into a legend and allow me to remove the stacked bar legend (top center). I really appreciated how this revision removed the visual clutter of the center ledge, however after trying this (figure 11) in Tableau, I realized that labels for two categories: Government Archives (gray) and Religious Org. (pink) in the treemap don’t display due to the corresponding treemap rectangle size. Not being able to get the these two labels to “force” display in Tableau, creates consistency challenges in using the treemap as the color legend. If these two DPOE-N graphs are used jointly in a non-tableau environment in the future, I would use the color-code treemap version and edit the visual with a program like Photoshop to add in the two missing labels.