Visualizing the IMLS Museum Universe Data File



For our Tableau Lab in LIS-658 this summer, I knew I wanted to work with something art-related, but I wasn’t quite sure what. In my search for datasets, I came across one on the IMLS website. The IMLS (Institute of Museum and Library Services) provides federal funding for the nation’s libraries and museums. Their mission is to “advance innovation, lifelong learning, and cultural and civic engagement.” I am familiar with IMLS since starting graduate school at Pratt because many of the museums and libraries in NYC have received IMLS grants. The purpose of the Museum Universe Data File is to provide a list of museums and related organizations in the United States. The dataset includes 33,072 rows of museum data with 44 column headings including: name, type, location (address, city, state, zip, latitude, longitude), income, and revenue. IMLS notes on their site that they create these datasets twice a year and began with the third quarter of fiscal year 2014, there are three datasets available on their site. After reviewing the data and later finding this document, I was interested in finding out:

  1. How many institutions are there in each type category? Which category had the most institutions and which had the least?
  2. Where are the institutions located? Which institutions had the highest revenue?


I figured bar charts would be a good place to start in analyzing the data. There are nine category headings for Museum Type which seemed fairly reasonable to view and illustrate in an ascending or descending manner. After watching the animations in class from Darkhorse Analytics, I wanted to do something Tufte-esque, along the lines of Data Looks Better Naked – stripped down and clean, but also visually telling.

image of before and after from Darkhorse Analytics showing a bar chart comparing calories of french fried to chili dog- highlighting bacon

I was also intrigued by the look of the two bar charts, here and here, related to population pyramids by Merlijn Buit. I will admit, I don’t speak or read Dutch, but I found them visually engaging.

bar charts by Merlijn Buit representing population pyramids

Buit’s bar charts above seem to be working with a different level of data or at least a more sophisticated approach than I was taking at this point in time. I looked to this simple chart, which related back to my initial draw with Darkhorse Analytics. This is Buit’s response to a Tableau question form the community, found here.

Buit's solution to Tableau community member issue, simplified bar chart with averages

None of my inspirations thus far involved art-related data, but they did illustrate a similar kind of data — categories with some numeric/quantitative data or values. In my searches, I ran across an article where the visualizations tell a story about artists, artwork type, exhibitions, and book sale prices. The bar chart and the map related more closely to my data, comparing types and quantities. I unfortunately could not find them on Tableau (which means they are not interactive), but there is a link to their dashboard in the article. The bar chart illustrates artists working in different media during different centuries and three types of exhibitions during those centuries (all exhibitions, amateur gallery exhibits, and solo exhibitions). The visualization points out that painters had the greatest number in overall and amateur exhibitions during the 19th century, and multimedia artists have that greatest number of solo shows across all four centuries illustrated. One thing, I don’t know is the actual source of the data. They mention in the article, it is from print or “archival” sources (I would need to reach out to them to find out more). The map viz was also of interest to me but I don’t find it to tell me much without reading further – not a good sign. As we discussed in class, most people don’t even read legends. I understood that the darker the shade of green, the higher the number of artists appearing in books, particularly monographs. I would also be curious to find out how they determined what an amateur gallery exhibit is defined (by reputation, by revenue, other?).

Punt and Teekens Tableau dashboard with map and bar charts

Downloading and Cleaning Data

Luckily, the data I obtained from the IMLS along with the PDF I was able to later locate left me with very little that I needed to do with the data. I changed “DISCIPL” to “Museum Type.” Otherwise I left the data as it was.

Working in Tableau

I started by dragging the quantitative values to columns area, in the first case this was the number of records and then the museum type to rows. By default the museum types were in alphabetical order. I rearranged them so they would be from greatest to smallest. I could immediately see that Historical Societies had the largest number of institutions in the US. Next I wanted to find out about revenue, which of these types had the highest revenue. Initially I created the bar chart in a separate worksheet  and later realized I could easily add it to the same worksheet to pair them side by side. I immediately saw that the Art Museums type had the highest revenue.

my Tableau bar charts with number of institutions with type next to bar chart of institution type with revenue

I had made an assumption that the highest revenue institutions were likely be found in New York or San Francisco, but I didn’t know if this was actually the case. Perhaps it was in the Midwest or the Pacific Northwest. I didn’t really know. I created a map to see not only where institutions were located but made the size of their location dot based on the revenue. It was immediately clear that the eastern half of the United States was more densely populated with museums than the western half (note that Hawaii and Alaska are excluded from the image below for ease of view). There are definitely pockets that show areas of density elsewhere, but overall the eastern half is more populated. One thing I struggled with here was the revenue range. The first map I created I put the revenue on a quick filter so that one could use the slider to move it back and forth. The revenue range is enormous, it spans from approximately -2K to close to 200B. I am not used to working with numbers at this scale, which was interesting and somewhat obvious form the difference in size of the dots.

map of US -not including Hawaii or Alaska showing the eastern half of the US has higher population

Future Thoughts

I realize I haven’t really worked with data much, this is new territory for me. There is a lot more to explore here with the data as well as with Tableau. Here are some thoughts for further exploration with this dataset and Tableau.

  • I thought about revising my bar charts to leave only the highest number and highest revenue museum type in red and the rest in grey. It would mimic the Darkhorse suggestion and possibly be much stronger visually.
  • I would like to revisit creating bins for the revenue. I would probably do in increments of 50 billion, but that also leaves out institutions with lesser revenue or zero revenue. Perhaps I could create another map viz that would look at Children’s’ Museums, Natural History and Natural Science, History Museums and Zoos, Aquariums and Wildlife Museums on their own since their revenue range is much less and use smaller increments there.
  • I would also like to do something with area maps and maybe focus in on certain cities. Since I live in NYC, moved here from Los Angeles, and grew up in the Midwest–  maybe it would be interesting to tell some sort of personal story or experience related to the museums in the dataset.
  • The dataset includes information about income, which might also be interesting to pair with revenue.
  • Seeing the huge disparity in the revenue also leads me to want to do something to illustrate that further. Maybe taking the top five in revenue and lowest five in revenue and highlighting them on the map to show location, institution name, and institution type.
  • The dataset includes the National Center for Education Statistics (NCES) Urban-Centric Locale Codes classification, which is a number that correlates to a description of the kind of area such as City, Suburb, Town, or Rural. This might also be interesting to work with as a filter. Color code the institution type and then filter by this type of locale to see the number of records in each those classifications.
  • Download and consolidate the other IMLS datasets available and see if there is something to evaluate over time or compare revenues from one Q3 in two different fiscal years.

What’s interesting to me about information visualizations is the ability to tell a story with them. The power of this intimidates me a little because it seems like you can easily manipulate the data to be deceiving. This intrigues me too, but also feels like I need to explore datasets and their power. My inclination is to create visualizations that would help potential users find things quickly.