By Amanda Chen
As a student of Museums and Digital Culture in Pratt, I always focus on analyzing data from museums and related institutions in the U.S. even around the world. Museum undertaking the social responsibilities of educating the public, serving for the community, as well as contributing to the economy. In the record of American Alliance of Museums (AAM), American museums support more than 726,000 American jobs; contribute $50 billion to the U.S. economy; have approximately 850 million visits each year, and approximately 55 million visits from students in school groups. Therefore, I chose the Museum Universe Data file to visualize in order to show the distribution of various types of museums and related originations in U. S. and their income performance. To see the final visualizations, check out my visualization dashboard onTableau Public.
My first step before exploring the appropriate worksheet in the online open database, I did some online research of data visualization style created by Tableau, and online materials from the American Alliance of Museums as well. After research, I dig data in several open sources which including Census & American Community Survey, Federal data, Pew Research Center, New York City data, New York State, United Nations, and Enigma Public database.
In order to get the worksheet that presents the critical information and can be visualized into suitable style, I have following requirements in my mind before I start to search in the database:
- Only the data related to museums industry;
- More than 1,000 records, and over 6 metrics;
- Available CSV format in order to download and import to Tableau;
- The worksheet must have the potential qualities to let me create at least 3 visualization styles in Tableau;
After searching, I found the target worksheet in Enigma Public database which has 33,072 rows and 49 fields. After that, I used OpenRefine to clean the data, and I only selected 8 critical metrics to visualize which including Museum name, discipline code, longitude and latitude, zip code, city and state, as well as income. Finally, I created a dashboard in Tableau. To see the database, check out the original worksheet in Enigma Public.
- Tableau Public: is a free data visualization software that enables multiple kinds of visualizations, dashboard creation, publishing to the web, or embedding into web pages.
- OpenRefine: a free and open source tool that helps users to explore, clean and transform large sets of data.
- Enigma Public: is an operational data management and intelligence company that specializes in data analytics and connected data.
- Institute of Museum and Library Service—Museum Universe Data File: where I found my target worksheet. This worksheet includes basic information on aquariums, arboretums, botanical gardens, art museums, children’s museums, general museums, historic houses and sites, history museums, nature centers, natural history and anthropology museums, planetariums, science and technology centers, specialized museums, and zoos. The initial data collection occurred in 2013 from Institute of Museum of Library Services administrative records for discretionary grant recipients, IRS (Internal Revenue Service) records for tax-exempt organizations and the recipients of grants from private foundations.
- American Alliance of Museums (AAM): AAM is a non-profit association that has brought museums together since its founding in 1906, helping develop standards and best practices, gathering and sharing knowledge, and advocating on issues of concern to the museum community.
Results and Interpretations
This is the state distribution heatmap of museums and related organizations in the United States. We can clearly see that California, New York, and Taxis these 3 states got the largest number of museums which the numbers are 2,670, 2,239 and 1,886 records.
The data of income of museum is originally from most recent Internal Revenue Service (IRS) 990 form. Income is a computer generated amount by the IRS. Information comes from IRS Business Master File, May 2015. Figure 2 shows that Massachusetts generated the largest income; California’s museums’ total income is the second; New York’s is the fourth. Combine the figure 1 and 2, we can perceive that Massachusetts has more profitable museums than the other states.
In figure 3, I used the museum discipline code as the metric. These codes represent the following institution terms: ART – Art Museum; BOT – Arboretums, Botanical Gardens & Nature Centers; CMU – Children’s Museums; GMU – Uncategorized or General Museums; HSC – Historical Societies, Historic Preservation; HST – History Museums; NAT – Natural History & Natural Science Museums; SCI – Science & Technology Museums & Planetariums; ZAW – Zoos, Aquariums, & Wildlife Conservation. In figure 3, we can see that there are over 14,000 historical Societies in the U.S. and they distributed all over the country.
I created a visual chart which can more apparently present the data comparison between the number of different types of museums and their total incomes. In Figure 4, I found that there is a relatively larger number of Historical Societies but they generated less income. At the same time, Art Museums have relatively larger income but less number of records.
Combine the figure 1 and 5, we can draw the conclusion that General Museums and art museums are more profitable in the U.S. However, the Historical Society and Historical Museums are the ones which have lower profitability.
Bmf15 F means the Internal Revenue Service’s Business Master File May 2015 flag. This flag indicates that this record was found in the most recent IRS 990 Business Master File Data. The IRS Business Master File (BMF) contains descriptive information for all active organizations (public charities, private foundations, etc.) that have registered for tax-exempt status for the IRS.
In the total 33,072 records, there are over 70% of museums are have tax-exempt status.
In my perspective, the most difficult part for the whole project is to find the perfect data because I have many requirements about it, and most data worksheets have their flaws and limitations such as hard to visualize or have fewer metrics. The biggest problem in my worksheet is the numbers in the income column are not correct. Therefore, the specific number of income on the X-axises in figure 2, 4 and 5 are all not correct (the number is too large to be realistic). Maybe the reason is that the original file didn’t record the decimal point correct.
Except for the incorrect number, this worksheet may also have the following problems:
- Non-museum organizations may be included.
- Museums may be missing.
- Museums may be listed multiple times.