Background
Prior to the pandemic, it had been years since I last read books for fun. In 2021, my friend told me about the Libby app, which allows people to borrow ebooks from their local library and read them on their Kindles, phones, or tablets. Since then, I’ve read over 200 books and all of them were borrowed from various libraries across the United States. Libraries have become a very special place for me, and through this project, I would like to better understand public library usage over the last few years. With apps like Libby and communities like BookTok boosting the popularity of reading as well as libraries going more digital, I’m curious to learn how public library usage has been affected and identify trends for popular books, topics, and material types.
The questions I’d like to answer and visualize through this project include:
- What are people checking out most frequently in 2022?
- What types of materials (ebooks, audiobooks, physical books) are people checking out?
- How does the number of digital checkouts compare to the number of physical checkouts?
- Which topics are most popular?
- How does the number of checkouts change over time?
Materials
Dataset:
I was initially interested in data from the New York Public Library, as this is one of the libraries that I borrow from most frequently. I found a list of the Top 10 Checkouts of All Time, which was enlightening to read, and hoped to find more data from NYPL. While I wasn’t able to find a dataset from NYPL, I was able to find one from Seattle Open Data that includes checkouts from the Seattle Public Library since April 2005. Though I haven’t checked out books from the Seattle Public Library, I thought that this would still be a great dataset to use to explore public library data and to see what people in Seattle have been reading and checking out in 2022.
This dataset includes multiple columns that I’m looking forward to diving into, such as UsageClass, which denotes whether an item is “physical” or “digital,” MaterialType, which describes the type of item checked out, and Checkouts, which refers to the number of times a title was checked out. I also really like that this dataset is very comprehensive, polished, and encompasses data over seventeen years.
Seattle Open Data. (2022). Checkouts by Title. Available from https://data.seattle.gov/Community/Checkouts-by-Title/tmmm-ytt6
Software:
I used Microsoft Excel to take a closer look at my data and Tableau to create my visualizations for this project.
Methods
My initial plan was to download the dataset in its entirety and to create visualizations based on the 17 years of data available, but I very quickly realized that this was not feasible. The full dataset took up over 9 GB of space and Excel couldn’t open the file since the dataset contained over 41 million rows of information. Although I was able to figure out how to import the data to Tableau using its OData endpoint, it took over an hour for Tableau to prepare the data every time I switched from the data source to the sheets.
I took a closer look at the table that listed 100 rows of Seattle Public Library checkouts at a time, and realized that I could use filters to save only a portion of the data instead. I ended up narrowing my scope to only include data for 2022 downloaded the data accordingly. I downloaded two separate CSV files (January through July and July through October) to ensure that I could open them in Excel without either file exceeding the 1,048,576 row limit.
I reviewed both CSV files in Excel, but did not see much that needed to be refined. The data was already in a clear and usable state, so I uploaded both files to Tableau, and used a union to stitch the two tables together.
Design Rationale
For my first visualization, I wanted to learn which titles are most popular in 2022. To do so, I chose a bar chart to easily compare all the titles that have been checked out this year. I noticed that there are duplicates in the titles checked out, such as two instances of the book “Dune” listed near the top of the graph, and realized that these duplicates are due to the book being checked out in another material type, such as an audiobook or an ebook. To differentiate between each checkout, I decided to color-code each title by material type. Furthermore, since there are over two million rows of data, I wanted to narrow down the most popular checkouts by adding a filter to show only the top 25 titles with the highest number of checkouts.
With the color-coding by material type in place, I thought it could be enlightening to take it one step further by looking at the most popular materials and which books were checked out most frequently in those material types. I started by creating a pie chart of material types to see the distribution of checkout materials and single out the most popular ones. I added the number of checkouts to each material type, and made sure to keep the color scheme consistent with my first visualization of popular checkouts to allow users to quickly understand which materials they are looking at. Furthermore, I decided to narrow down my chart to just the top 5 most frequently checked out material types so as to avoid having a very long legend.
After learning that physical books, ebooks, and audiobooks were the three most popular material types, I created three bar graphs that visualized top book checkouts, top ebook checkouts, and top audiobook checkouts. I decided to narrow down these three graphs too by adding a filter to show the top 5 titles for each material type, and also chose to display my data using horizontal bars to ensure that book titles can be read at a glance. To make my three charts easier to view and compare side-by-side, I decided to stack all three charts vertically in a dashboard.
Next, I wanted to look at the number of digital checkouts compared to the number of physical checkouts to see whether the Seattle Public Library had more digital or physical checkouts this year. I used the UsageClass variable to create a pie chart and used different colors to distinguish between physical and digital checkouts.
I was pretty intrigued by the Subjects variable in this dataset, and wanted to know more about the most popular subjects and topics that were checked out this year. I chose a tree map to portray this data and to compare amounts and categories. I selected a blue color scheme, with the darkest blue representing the most common subject, and lighter blues representing less popular subjects. In addition, I used a filter to only display the top 100 subjects based on the amount of checkouts so that the data won’t be too overwhelming.
For my last visualization, I was really curious about the CheckoutMonth column, and wanted to graph it to see the most popular months by checkout in 2022. I decided to use a line graph to visualize checkouts over time for this year.
I created a story in Tableau to display my data and to group my charts together, and added my research questions as the corresponding story caption for each sheet.
Results
I was pretty surprised by what I learned from my visualizations afterwards. For one, I didn’t expect that the item with the greatest number of checkouts from the Seattle Public Library would actually be headphones with about 15,000 checkouts. I assumed that a book would be at the top of the checkouts list, and that it would be followed by the SPL HotSpot. This leads me to think that the Seattle Public Library seems like a great place to go and borrow headphones if one is in need of them, or to go and use the WiFi there.
I also noticed that apart from the headphones and SPL HotSpot, there were five physical books in my visualization of popular checkouts, for a total of seven physical items out of twenty-five. The most popular checkouts seem to skew towards digital, specifically ebooks, and this is confirmed by my pie chart comparing the number of digital checkouts compared to physical checkouts. Even so, physical books still dominate as the material type with the greatest amount of checkouts as indicated in my materials pie chart. Ebooks are definitely not far behind though, with a total of 2.2 million checkouts to 2.4 million physical book checkouts this past year.
I was also surprised that video discs are the fourth most popular material type to be checked out after physical books, ebooks, and audiobooks. I personally haven’t had the need to use a video disc in years, especially with the rise of streaming applications for music and for shows and movies, so I definitely did not anticipate that video discs would still be so popular. Even so, it’s great that libraries are still stocking a lot of video discs and that people are taking advantage of this resource and borrowing them.
It was fun to compare the top five physical book checkouts against the top five ebook and top five audiobook checkouts. I didn’t notice too much overlap across the three charts, apart from Cloud Cuckoo Land by Anthony Doerr in both in the top ebook checkouts and in the top audiobook checkouts, but I’m sure I would find more duplicates if I had included the top ten or top fifteen checkouts from each category.
For my visualization of the most popular subjects checked out in 2022, I did expect that fiction and literature would be at the top, but I was surprised to see that mysteries, thrillers, and historical fiction would be the next three most popular subjects. I thought that genres like romance or fantasy might be closer to the top, but romance is a few spots below mysteries, thrillers, and historical fiction while fantasy shows up towards the middle of the tree map. In addition, I didn’t expect that books related to cooking and food would be the most popular nonfiction subject.
For my visualization of checkouts over time, I thought it was interesting that checkouts seem to be pretty up and down throughout the first few months of the year but levels out and increases over the summer. I think it could be that more people are traveling during the summer and bringing beach reads with them, or that they have more time to check out books and other materials during the summer.
Reflection
This was a fun and informative project for me, and I definitely learned a lot about what people were reading this year and general checkout trends from the Seattle Public Library. I definitely feel very comfortable using Tableau for visualizations now, especially applying filters and using tools such as dashboards and stories. I also felt that this was a very engaging topic for me to explore as someone who borrows from libraries very often.
If I could continue working on this project, I would definitely love to compare my data from 2022 to previous years to see how checkouts have changed over the last few years for the Seattle Public Library. It would be fascinating to pinpoint when people started to check out more digital materials as opposed to physical materials from the Seattle Public Library, and also note whether the amount of checkouts have increased or decreased over a longer period of time. I think that taking a look at data since the start of the pandemic would be cool too, and to ascertain whether the pandemic had an impact on public library usage.
I also believe that it would be very enlightening to look for correlations for subjects and checkout months, such as if there is an increase in checkouts of titles by Black authors for Black History Month in February or if there is an increase in LGBTQ+ books in June. I also think that it could be fun to look at BookTok trends and compare trending books such as Colleen Hoover’s novels or Sarah J. Mass’s book series to the number of checkouts to visualize the impact of online communities on public libraries.
I think it’s important to note that this data is not representative of public library usage across the U.S., and I would also want to compare data from the Seattle Public Library to that of libraries in other states to see what differences there may be in digital and physical checkouts, subjects, and materials that are checked out. Comparing public library usage in a rural library to a city library would be very intriguing, and I’m sure that I would learn even more from visualizing data from multiple libraries.