Banned Books Data Visualization

Introduction

I grew up in my local library. Summers were spent reading anything and everything I could get my hands on. My immediate world was small, but books opened my mind to the possibilities of what could be, and futures that I had never even dreamed of. As a child, I had the privilege of engaging with subject matter and ideologies that weren’t necessarily intended for me, yet they meaningfully shaped my worldview. I was lucky to be a member of a library that did not ban books. My librarians supported my curious endeavors as did my teachers and family. The unfortunate reality though in this moment, is that many students across the United States face the reality of book bans. Specifically, these bans are often focused around powerful books that have the potential to change the ideology and psyche of a generation. As more and more instances of book bans have been reported on the news, I sought to understand how these cases were tracked, and if there was a central repository where all data related to the banning of books was stored. I’m curious about what sorts of books are banned, what it takes to get a book banned, and if there are any similarities across books which are banned. Through this project, I’m hoping to examine banned books from a variety of angles to develop a clearer understanding of the landscape.

Inspirations

The American Library Association (ALA) publishes some really compelling infographics surrounding censorship, and often links it closely to their efforts around tracking book challenges / bans. A quote from the infographic really stood out to me,

“Had I had a book like that on the shelf, I might have realized a lot sooner that I could love myself. I might have realized a lot sooner that it’s O.K. to feel different.”
Community Member on keeping “Prince and Knight” on a public library’s shelves

Not only do I know a lot of young people who’s lives could be meaningfully changed by having access to radical and progressive pieces of literature, I was that young person who needed these sorts of books to find myself. This project seeks to make interactive the static materials that have been created by the ALA’s Office for Intellectual Freedom. I need to do my part in understanding the landscape of book challenges so I can help to better resist them in the forward.

Methodology

Tooling

OpenRefine – the initial tool I attempted to use for data blending / manipulation and API queries
Microsoft Excel – the tool I used to manually update and tidy my dataset prior to upload
Tableau Public – the visualization tool I used to create my multi-page dashboard

Dataset Preparation

I spent the majority of my time preparing my dataset as it was not something that I could easily download or automate via a script. Initially, I had identified a dataset from PEN AMERICA that was focused on 2021-2022 school book bans. While this dataset was well-formatted and had a variety of interesting fields (e.g., State, School District, Type of Ban, Ban Date) it lacked a unique identifier for the books and also provided no reason as to why the books were banned. I spent multiple hours in OpenRefine attempting to reverse query title / author in order to fetch ISBN or OCLC and none of my efforts were successful.

Pen America dataset that I hoped to reverse query for ISBN / OCLC and was ultimately unsuccessful.

After discussing my options with Professor Sula and Librarian Nick Dease, I decided to shift to scraping my data from the ALA’s yearly Top 10 most challenged book list. Bringing this into Excel was not horribly difficult – I quickly identified a few commands that could speed up the formatting of the data when copying and pull it into a tidy format. However, pulling in the additional data points that I wanted was time intensive. While the ALA had the rich reason field which distinguished their dataset, they also lacked a unique identifier and only provide title / author / challenge reason / rank / rank year.

Using a mixture of WorldCat and Amazon Books, I was able to slowly but surely complete my dataset by adding an ISBN unique identifier as well as my other fields of interest (Author Gender, Page Count, Book Type, Genre, Publish Year, Fiction, and Purchase Link). Many of the books on the Top 10 lists were repeated more than once, so I was lucky to only have to perform the manual exercise for ~60 unique entries. (I used the v-lookup functionality in Excel to pull in the additional book metadata across one hundred rows) Once my dataset was finalized, I was able to upload a single spreadsheet to Tableau Public and begin the effort of designing the dashboard.

Screenshot of Excel Dataset that I assembled largely manually

Dashboard Design

Chapter 1: Exploratory Breakouts

My first visualization aimed to get my audience comfortable with the dataset and digest some interesting descriptive statistics about the challenged book datasets. The three pie charts up top were effective visualizations as the breakouts involved < 4 categories with meaningfully different sizes. It is easy to attribute each slice and understand the constitution of the challenged books. The mosaic chart below was most effective for genre as there were a relatively high number of categories in this field. A pie graph or a bar graph likely would have been too cluttered to visualize this information successfully, but the mosaic chart did so elegantly.

Chapter 2: Frequently Challenged Books

For Chapter 2, I focused on frequently challenged books as that is likely the first question one would ask when met with this dataset – which book is challenged most often? The answer would be, The Absolutely True Diary of a Part-Time Indian with 7 challenges. It’s easy to identify this information from the bubble map as size of bubble indicates challenge frequency. I added a color dimension with the field author gender, as I was curious about whether or not an obvious trend would emerge between men and women writers and the likelihood that their book be challenged. There seemed to be a pretty even gender balance here, the high frequency titles are not distinctly clustered with one gender.

Chapter 3: Most Controversial Books

This visual was my favorite to design as it required some creativity. I wanted to answer the question of which book is the most controversial, not just which book appears most frequently on these lists. From my perspective, there is a difference between being #10 for five years in a row and being #1 for three years. In order to calculate this mathematically, I made a new variable called ‘Points’ and inverted the rank for each title entry. (For example, if a book was ranked #1 it was given 10 points. A book ranked #10 was given 1 point.) From there, I summed all of the points for every title and used that to inform the size of the bubbles on the bubble map. The Absolutely True Diary of a Part-Time Indian also took first place in the controversial category, closely followed by George. I retained the gender coloring as I enjoyed the color palette and wanted to see if any trends could be observed.

Chapter 4: Rank By Year

This visualization was quite simple but powerful. I chose to use a table as the primary focal point, consisting of Title, Reason for Challenge, and Rank. Additionally, I added a Rank Year filter so that one could go through and analyze the Top 10 challenged books for every year within the dashboard. If one chose to filter on all instead, a table view would generate with multiple reasons for books that were challenged multiple times. Finally, as a simple touch on the bottom I included a smaller version of the mosaic chart from Chapter 1. I found it very interesting to understand for a specific year, what kinds of books made it in the top 10 from a genre perspective.

Chapter 5: Length of Challenged Books

This chart was one of the simpler ones – just a bar graph indicating the length of the book on the y-axis, with each bar representing an individual title. The shading indicates the frequency with which the book was challenged – a dark shade suggests a high frequency, while a paler shade suggests a low frequency. I made sure to include this chart as I was curious if longer (and likely more mature) books would be challenged more often than those that were shorter. This chart however shows that there’s a pretty even spread, with the largest concentration of challenges occurring in the middle of the graph.

Chapter 6: Publication Years

The publication year visualization attempted to organize the challenged books based on the year in which they were published. The chart has a timeline feel with slider filter capabilities to restrict or expand the publish years in scope. The Holy Bible is an outlier in this dataset as it was published centuries before the other books that have been challenged, so I started the slider in the late 17th century to avoid its visualization on screen. As usual, larger dots indicated that a book from that year was challenged more times relative to smaller dots.

Appendix: More Information on Challenged Books

Finally, I created an appendix page that aimed to provide more information on challenged books in a bare bones fashion with a pure focus on tabular data. Unique to this tab is the inclusion of an amazon.com link for each title. The inclusion of the amazon link allows users to quickly access more information about all of the challenged books featured, and purchase it if they feel so inclined. Having a simple and all-inclusive spot where one could get an overview of all the challenged books felt like an appropriate appendix element.

UX Research

I was lucky enough to be able to reach out to two former library volunteers (the network is strong) to serve as my in-person testing participants. I wanted my users to be people who had read some of the challenged books and had familiarity with libraries. They were both men, millennial and lived in an urban area. Participant 1 used a mobile device for the study while Participant 2 used my laptop. I used a semi-structured think aloud approach to gather feedback and insights from both of my users. Given my goal was to design an exploratory dashboard where a user could learn more about challenged books, they think-aloud approach was very effective.

Positive Feedback:

Participants were able to use the functions in Tableau without any questions or support. They found the filters intuitive and were able to navigate between the pages with ease
Participants enjoyed the “Chapter” framing of the various dashboard pages as it aligned well with the theme and provided strong narrative framing
Participants felt that they had ample information in the dashboard to develop a baseline understanding about the challenged books and noted that they appreciated the amazon link in case they wanted to buy or learn more

Negative Feedback:

While the chapter framing was nice, the descriptors sometimes felt unclear. Participants weren’t always sure what they were looking at when they got to the page. They could click and observe differences, but not necessarily interpret.
Labels were really important to both of my participants, and they commented (with disdain) about variables that I failed to rename or remove prior to sharing
Some of the legends or filters weren’t necessarily placed in the most intuitive locations on the screen. Participants felt that I could do a better job at making this more obvious.

My Changes:

I shifted the order of the chapters to start with a variety of breakout graphs. This enabled the user to become familiar with the different pieces of data in a static context before having the opportunity to use it for manipulations. Additionally, on every chapter dashboard I wrote a narrative blurb explaining the purpose and how to accurately interpret / interact with the charts present.
I took another pass around the dashboard and removed labels that were superfluous as well as editing labels that were messy and distracting (e.g. Rank(Copy1) Count)
I shifted most of my legends to be ‘Float’ style rather than fixed and placed them in close proximity to the accompanying graphs. Given these legends weren’t being used as a filter and were meant to inform, I think this was the correct choice of action.

Results

After actioning the feedback, I was proud to present my complete challenged books dashboard. It contains six distinct chapters that allow users to engage with the challenged books dataset in different ways. Users interested in examining the reasons why books are banned were able to look at the detailed reason in tabular format. Meanwhile, those who wanted to explore trends related to author gender, book genre, page count or more were able to do so with the more quantitative visualizations.

View The Dashboard

Reflections and Future Directions

Building this dashboard was deceptively challenging. My topic seemed simple in theory, but required a significant volume of manual work just to create the underlying dataset. My design definitely suffered as a result, as I became much more focused on the data and subject matter rather than the overall presentation. I question if I would have been better off sticking with my original PEN America dataset and taken a blended spatial / graphical approach to representation the data in a dashboard. I will be much more mindful in the forward about making my own dataset from scratch, as underestimated the intensive nature of such an activity (and the level of attention to detail required).

With regards to the existing dashboard, I would have loved to add more imagery of the challenged books had I had more time. Professor Sula had shared some helpful resources on adding visuals to Tableau, but after the manual scraping I had already gone through to make my underlying dataset, this felt out of scope. That being said, it would have greatly enhanced the bubble charts or the ranking tables if when one hovered over them an image of the book’s cover came up. It would have been that “wow factor” that makes a dashboard especially memorable. Additionally, I think I could have spent more time on the color palette. Initially, I tried to apply some classmate feedback and use the color red as the subject matter is bans, but found it to be too visually aggressive for my tastes. I like the purple gradient that manifested multiple times throughout the dashboard, but the lack of a cohesive color theme is definitely an improvement that can be made in a future iteration.

I’m looking forward to continuing my work in Tableau, and specifically this project so that I can have a portfolio-ready deliverable for prospective employers and collaborators in the future.

Information Visualization

Student work at the School of Information, Pratt Institute

Banned Books Data Visualization

Introduction

Inspirations

Methodology

Dashboard Design

UX Research

Results

Reflections and Future Directions

Introduction

Inspirations

Methodology

Dashboard Design

UX Research

Results

Reflections and Future Directions

Related posts: