The Chinese Collection at the Metropolitan Museum of Art


Visualization

Introduction

The purpose of this lab is inspired by another course that I am taking from the History of Art and Design Department this semester. In the Color Studies class, I was assigned the color red and need to do decent research on this specific color. Some of this research is deeply related to online metadata that connect with both museum website and their public domain datasets. So I decided to choose the Met as one of the largest and also richest collection institutions to deeply analyze how their Chinese Collection is preserved and presented in the museum’s open domain dataset. Even though I would really want to engage more on the R and RStudio, I faced some of the limits which turned me back to Tableau Public for the visualization. The main questions that aimed to solve this lab research are:

1) What category of Chinese artifacts is the largest amount that was collected in the collection?

2) Which dynasty/date were been mostly collected?

3) Do they randomly been collected, or do they have a specific collection pattern?

4) How would they display in the museum?

5) What is the online tagging system related to object research?

Inspiration of Visualization

Figure 1. Asian Art official website from the Metropolitan Museum of Art, hyperlinked to the website

The main inspiration for this project is that as an internationally well-known institution, the MET does not hold individual introductions, as well as specific data introductions when the public access through their online pathways. Figure 1 shows a preview of what you can get when you type “Chinese art” in both Google search results and MET’s official website. This is the only page listing all arts from Asia, and it only offers brief introductions about the history, the general collection scope, highlights, and related acquisitions for Chinese, Korean, Japanese, Indian, and Tibetan artworks all in one place. Indeed as intuited by my cultural heritage I am interested in finding the whole Chinese collection in the museum and visualizing a more detailed illustration to provide a guide for my future research purposes.

Figure 2. The Metropolitan Museum of Art Open Access CSV

Gathering information is not hard for me as an MDC student. In the previous information management class at Pratt, I was aware that a lot of surrounding museums are using Github as their archiving and information-sharing platform. Figure 2 shows how I get access to the museum’s open-access file, and I am using the lasted committed data, which is updated on Sep 19, 2022. Even though the figure is listed as a .csv file, the actual file is in .txt format so I have a hard time directly managing into either Tableau or RStudio. Another main trouble is the size of this dataset. It is too LARGE. Though the original .txt file is only 303,123 KB, the worksheet that I managed to .csv file is about 20 MB and it is too large to use these visualizing tools to process directly. This actually crashed my computer three times when I finally realize I cannot directly use the whole dataset. The original file contains 477805 roles and 54 columns, and the empty cells were usually left blank instead of saying N/A or setting a word for that. Indeed it took much longer time for me to process and manage my data, waiting for the computer to respond and finally adapt another worksheet with the dataset that I need to work with.

Dataset and Tools

In the worksheet that I am working on later on, the resources are based on the original .txt file, but I modified it to only contain contents to satisfy my research purposes. In detail, I delete the three columns named “is highlight”, “is timeline work”, and ” is public domain”. In the Department dropdown, I choose only shown Asian Art results; In the Culture dropdown, I typed in China as the filtering criteria and only selected results that contain the word China. Indeed, my final worksheet is only based on a dataset of 14243 roles (14242 objects) and 51 columns. Figure 3 shows the cleaned-out data that I am working on for this lab.

Figure 3. The dataset that is working on for this project

The visualization tool I am using for this lab ends up still Tableau. I am trying my best to apply R into consideration because I am impressed by how detailed and free I can get from the type-in command and the customizing result that I can get from RStudio. The biggest challenge for me is the unfamiliarity with the coding language and the limited time for me to explore more. Since my dataset is too large for either Excel or Openrefine to process, I can only manually copy-paste and cut cells and data to fulfill my need. Here are some issues that I faced when applying my sheet to RStudio.

Figure 4. issues that I encounter when I apply worksheet in R

Figure 4 shows the last issue before I gave up trying to define my df <-. I have tried all different files, forms, and data reviews but non of them really work out. Before that, I had trouble deciding and finding the right package that I need to install and that took me a long time to figure out. However in the future, if I got a sufficient amount of time, I may come back to this because I still want to learn and use this platform to get a better result in displaying my desired data. This lab report will also contain my thoughts and comments on what I expected to be done or improved from R so it can refer me to come back. In Tableau, it was relatively easy this time to apply and drag the information I want to display. Due to the richness of content included in my worksheet, I am able to find out all my needs and compose this lab report.

Visualization and Interpretation

My visualizations are based on the five questions I am trying to figure out from the dataset. In responding to the first question, I try to find out the category of Chinese artifacts that is the largest amount that was collected in the Asian art collection. To my surprise, among over fourteen thousand objects collected, the largest class of collection is ceramics instead of more easy-preserved objects like paintings or stoneworks. In Figures 5 and 6, I have posted two different graphs to show two sets of information in response to my question. Figure 5 is more tends to answer the rankings from high to low, and is more direct to identify the exact numbers of each category as well as they stand in portions to each other. For instance, on the label I intentionally displayed here, the top 1 ranking ceramics is about ten times larger than ranking #10, the Tomb Pottery; and it is more direct to see the range each category falls because of the guidance line. Figure 6 on the other hand focused on the size. The advantage of using a bubble graph is to directly show the ranking effect by occupying the space. It is however hard to tell the diameter and area of the bubbles so I intentionally hide all the numbers so readers can focus more directly on the size of the circles.

Figure 5. Amount of Objects in relation to Classifications – Bar Chart
Figure 6. Amount of Objects in relation to Classifications – Circles

My second question is trying to analyze which dynasty/date were been mostly collected originally. This is probably the most problematic data I was gather and displayed because of the following reasons.

Firstly, the original dataset is using various dating systems when they input the same timeline but different values. This is causing me to have a very hard time defining which dynasty/period the work might be. Maybe it is also hard for the museum specialists to define them into a specific category; but for some of them, for instance, even though they all defined into Qing Dynasty, they have 20 more different types of Qing Dynasty(s) displayed so Openrefine cannot tell the difference (see figure 7). I have tried several splitting/dividing tools in Openrifine but did not figure it out.

Figure 7. Having trouble defining and splitting the column “Period” in keywords

Secondly, keywords are not helpful in Tableau. The biggest problem in defining this frame is to make sure the time period is correct. However, this is a mess in the original descriptions. In figure 8, I have tried my best to display my need for the period in response to numbers. If I want to select the Qin Dynasty instead of the Qing Dynasty, the “include” function is not helpful because Qin is included in Qing. Moreover, some of the descriptions contain two separate periods of time, and I have no idea how to teach the machine to read and realize they are two things at this moment. Indeed at this point, I cannot conclude that the Qing dynasty is the largest collection at the MET because there are many more minor categories that should also belong to Qing or other periods. But I can conclude that according to the museum record, artifacts that were defined as the Qing dynasty (1644-1911), are the largest number among all Chinese collections in the museum.

Figure 8. Amount of number in relation to artifact’s period

In order to analyze the third question, I focused on the accession year of each object, make them count, and then summed them in relation to the accession year. Figure 9 shows a duo-axis display of the number of artifacts that have been collected in relation to accession years. From the data distribution, it is not hard to tell that the peak appears to be random and does not have a specific pattern of increasing in a specific period of time. In order to show the pattern is relatively random, I not only put a bar chart to indicate the amount but also put a tendency line to show the dramatic flow within different ranges. I get rid of the contour of the bar so that readers will get a sense of the area, and easy to tell that 1900-1930 is a crowded range that the museum boomed Chinese collection. I selected 6 peaks and put a note next to them to show the exact number for these years.

Figure 9. Amount of number artifacts that have been collected in relation to accession years

My fourth research question is to find out how the museum displays its Chinese collection. Unfortunately, there is no specific indicator/dataset listing how each object was placed where according to the original worksheet. However, I can get a chance to see how they were presented in different galleries in the museum by integrating the gallery number listed in applied cells to see how they displayed currently, and thus find out where are the main locations where visitors can find these Chinese exhibits. A main finding is that due to the gathered data, it is extremely important to notice the null cells, which was usually been neglected. Compare with all the ongoing displayed items, the null/unavailable objects are over 13 thousand, which means that the Asian collection was only put on display for less than 1/10. Figures 9 and 10 show the difference between including and excluding the null cell and it is so important to include it in the final report. I wish I can separately display the Null bar so that it does not influence the other part of the distribution. From these two graphs, it is easy to notice that the galleries that start with 2 are the main areas that displayed Asian arts, and this is approved by the map provided on the MET’s website (see figure 11).

Figure 9. Amount of Artifact in distribution of Gallery Number (with undisplayed artifacts)
Figure 10. Amount of Artifact in distribution of Gallery Number (without undisplayed artifacts)
Figure 11. floor map from the Metropolitan Museum of Art, 2nd floor, Asian wing in the red circle

In my previous Digital Strategy class, my professor, who is previously working at the MET’s user experience team, introduced the tagging system in SEO website advancement. Indeed in this research, I would also like to see what is the most accumulated keywords and tags that the museum collection contained. My visualization for this research, however, is having the same issue as the third question. For instance, the most ranking “Flowers” should also be counted in the categories as”Birds | Flowers” or “Butterflies | Flowers”. But I tried either excel edit or Openrefine, but neither can identify the Bird | Flower category in both the bird category and flower category. Indeed, this graph is only showing the objects that exactly use the same tags, and flowers may be the most collected genre of artifacts in the collection (see figure 12).

Questions and Self-assessment

Overall I am satisfied with my findings in relation to my datasets and visualizations. I am surprised by the huge collection that is currently held at the MET in Chinese art, and it is great to know that the largest category the collection is held in is ceramics. Things that can be further improved can be asking how to define my df in R so that I can use coding to customize some of the keyword findings to fulfill my studies. Moreover like in Figure 6, the color in the single graph can vary to fit the differentiated needs. I also need to get more familiar with Tableau to combine my sheets into one single dashboard when it is possible to adjust the size to the right fit.

Original Tableau Public links can be found at:

https://public.tableau.com/app/profile/sean3473/viz/AmountofObjectsinrelationtoClassifications/1

https://public.tableau.com/app/profile/sean3473/viz/AmountofObjectsinrelationtoClassifications-CircleDisplay/6?publish=yes

https://public.tableau.com/app/profile/sean3473/viz/AmountofnumberinrelationtoArtifactsPeriod/2?publish=yes

https://public.tableau.com/app/profile/sean3473/viz/AmountofnumberartifactsbeencollectedinrelationtoAccessionYear/3?publish=yes

https://public.tableau.com/app/profile/sean3473/viz/AmountofArtifactindistributionofGalleryNumberwithundisplayedartifacts/4?publish=yes

https://public.tableau.com/app/profile/sean3473/viz/AmountofArtifactindistributionofGalleryNumberwithoutundisplayedartifacts/7?publish=yes

https://public.tableau.com/app/profile/sean3473/viz/AmountofArtifactinrelationtoTags/5?publish=yes

Reference

https://github.com/metmuseum/openaccess/commit/852a085f9bd3833028513444cabed217f87d6e64#diff-9f9583202c5d326e17789ac08f06b9ec913a7c546a4ab5f68dc32fa9f3732d66

https://www.metmuseum.org/about-the-met/collection-areas/asian-art

https://maps.metmuseum.org/?screenmode=base&floor=2#hash=17.3/40.779556/-73.963243/-61