Visualizing dpoe-n outreach efforts


Charts & Graphs, Lab Reports

Introduction

I’m currently working as a Graduate Assistant (GA) for the Digital Preservation Outreach and Education Network (DPOE-N), hosted by Pratt Institute and NYU. DPOE-N received a 2-year Andrew W. Mellon Foundation grant to provide professional development funding for cultural heritage professionals to attend workshops and training focused on digital preservation. In response to the ongoing COVID-19 pandemic, DPOE-N is also offering emergency funds for small archives, libraries, and museums to purchase external hard drives to better preserve at-risk digital collections. 

The grant is supported by six graduate students focusing on areas of research, outreach, and community engagement. I wanted to create these graphs to help us evaluate our collective work to promote DPOE-N funding opportunities across the United States and Territories. I also hope it will inform which regions we have yet to focus our research and outreach efforts on. The graphs presented here are from internal Airtable sheets that track all the institutions, organizations, types, and locations we have reached out to directly via email, as well as application statistics. 

Inspiration

  1. New York Times

As a result of both COVID-19 and the U.S presidential election, a lot of news outlets have created visualizations mapping the pandemic and election results across the United States. These were helpful examples to inform how I wanted to visualize DPOE-N outreach efforts across the United States and Territories. Through this exercise, I realized most COVID-19 visualizations I’ve seen leave out a majority of U.S Territories!

visualization of the united states where each state is a square with numbers marking covid cases.
From “Watch How the Coronavirus Spread Across the United States” article from March 21, 2020. New York Times

2. Fundamentals of Data Visualization by Claus O. Wilke

Wilke’s bar graph visualization Population growth in the U.S. from 2000 to 2010 was a helpful example of how to format and utilize color in a bar chart with a lot of values and long label text.

screen shot of a horizontal bar graph in blue green yellow orange.
Screenshot of Figure 4.2: Population growth in the U.S. from 2000 to 2010. States in the West and South have seen the largest increases, whereas states in the Midwest and Northeast have seen much smaller increases or even, in the case of Michigan, a decrease. Data source: U.S. Census Bureau

https://clauswilke.com/dataviz/color-basics.html

Method

  1. Data Source

I exported two comma-separated values (CSV) text files from Airtable, a collaborative spreadsheet and database application. 1) The outreach sheet that tracks all the institutions we’ve researched and reached out to directly via email. 2) Applications for funding opportunities DPOE-N has received. 

* please note I am not able to share the original CSV sheets publicly, please contact me if you are interested in accessing them.

  1. OpenRefine

Because the Airtable sheets are utilized by all six GA’s I wanted to assess the data in OpenRefine, an open-source data set editor, to search for possible entry inconsistencies that could affect my final visualizations. After identifying which columns I would use in my visualizations, I used OpenRefine’s text facet filter to evaluate for any duplicates, inconsistencies, or empty cells. 

Screenshot of a text facet menu from openRefine
Screenshot of OpenRefine text facet menu

For example, the text facet filter on the Organization Type column revealed 9 empty cells. I went in and added organization types for each. Once I completed remediating the data in OpenRefine, I downloaded both sheets as new CSVs. 

  1. Tableau Public

Before beginning in Tableau Public, free software to create interactive data visualizations, I sketched out a rough idea of what types of graphs I wanted to create in order to explore my research goal. I was interested in counting the number of institutions reached out to by state and territory, as well as which states DPOE-N has received applications from thus far. In addition to visualizing what types of organizations we have contacted whether museum, academic institution, independent, etc. 

Process and Visualization

Dashboard 1

Screenshot of a stacked bar in orange and light blue to the left and a horizontal bar graph on the right in teal and coral.
Dashboard 1 of DPOE-N Outreach and Application by Location

I decided to use two different charts to explore institution outreach count in each state and territory (Institution Outreach Count by Location – horizontal bar) alongside which states DPOE-N has received applications from, how many, and stacked based on the application type (Applications Received by Location and Type– stacked bars). For the first graph mentioned, I was interested in visualizing which states DPOE-N has not reached out to. I discovered, however, because the CSV’s tracked which institutions in states we’ve reached out to, locations that we haven’t reached out to were omitted from the data set. I went back and added a row for missing states and territories then modified the discrete Location dimension variable to Count-by-Distinct occurrences of Institution names. This discrete dimension change leveraged the null Institution names for the added Location rows to be displayed as zero. 

In order to visually and conceptually relate the two graphs, I used a different color (red) to highlight which states received applications to the Institution Outreach bar chart and added count labels (how many applications) on the end of those bars. This was achieved by adding the Locations count dimension from the Application CSV as a Label to the Institution Outreach Count by Location chart and selecting each Location bar and setting the Mark label to “Always view.” The colors were based on DPOE-N’s logo to ensure brand consistency. I was also conscious to use different colors in each chart to prevent any user interpretation confusion. Lastly, because I was interested in the discrete count of institutions I did not display the locations alphabetically. Instead, both graphs sort by count descending–bar with the highest value listed at the top and lowest at the bottom. The resulting dashboard visualizes the number of institutions reached out to by location alongside which states DPOE-N has received applications from. 

Dashboard 2

Dashboard 2 DPOE-N Institution Type and Location stackedbar and treemap

In the original Airtable sheet, we enter multiple value types in the Institution Type column. To visualize the types of institutions DPOE-N has reached out to, I went back into the CSV in OpenRefine and split the multi-value column, and then transposed the columns into rows. This allowed me to create a stacked bar graph based on Location and institution type as well as a treemap based on total institution counts. For the stacked bar I set the color based on institution type count. The resulting dashboard aims to represent the spread of institution types we have reached out to as well as the type breakdown within each state or territory. I also added a filter by institution type feature in the center legend that highlights that institution type in both graphs.

Reflection

Dashboard 1
I had initially tried to color the states and territories by region, similar to Wilke’s visualization cited above, but it created unnecessary visual noise. I also realized my interest is the count of institutions by specific state rather than region. These two graphs demonstrate that there is not a direct correlation between how many institutions we’ve reached out to and how many applications we’ve received from that location. It is also important to note that we have promoted DPOE-N on national organization listservs, which I did not include in these graphs due to the vague nature of their locations. If I had more time, I would explore how to include this extra data set into these graphs. 

Dashboard 2
A major design challenge in the stacked bar graph was trying to use color to visually represent the different institution type categories. Interestingly, these graphs highlight how our application of category types is inconsistent. There are some categories that are very specific like Heritage Trail while others are more general like Library. Given the variety and inconsistent application of these categories the visualizations may not be as impactful highlighting types of institutions DPOE-N has reached out to. This exercise has revealed a need to go back and review our Airtable entry practices. If I had more time, I would also like to explore visualizing the types of institutions applicants have applied from.

References

Gamio. L, et al. (2020, March 21). Watch How the Coronavirus Spread Across the United States. The New York Times. https://www.nytimes.com/interactive/2020/03/21/us/coronavirus-us-cases-spread.html

Wilke, C. (n.d.) Fundamentals of Data Visualization. https://clauswilke.com/dataviz/color-basics.html

Tableau Public tableau.com

OpenRefine https://openrefine.org/

Digital Preservation Outreach and Education Network dpoe.network