networking diversity in children’s literature

Final Projects, Visualization


In recent years, the literary world has acknowledged and sought to improve the underrepresentation of diverse characters in children’s literature. To illustrate this, the Cooperative Children’s Book Center (CCBC) at UW Madison began collecting data on the number of books by and/or about people of color in 1994. The CCBC receives almost all children’s trade books published in the United States, so their data was able to give a reasonably accurate portrait of the state of diverse representation in children’s literature in the U.S. In 1994, only 10% of children’s books contained multicultural content, and by 2017 this number had only increased to 31% (CCBC, 2017). According to Lee and Low, a publishing company, roughly 37% of people in the United States are people of color, but this ethnic and racial diversity is not portrayed in children’s literature (Lee and Low, 2018).

As the library assistant at an independent Pre-K to 8th school in Brooklyn, I conducted a diversity audit of the fiction books in the library’s collection using Karen Jensen’s Diversity Audit Outline. The fiction collection consists of over 1000 unique titles aimed at children in grades 3-8. Examples of the diversity types include: ability diversity, socioeconomic diversity, body diversity as well as race/ethnicity diversity and cultural diversity. I also recorded whether books featuring diverse characters were written by authors who are writing about their own experiences (the literary world refers to these authors as “Own Voices”)

Categories included:

  • Indigenous/Native American
  • Asian
  • Black/African American
  • Latinx
  • Middle Eastern
  • Refugees/New Immigrants
    • I chose to sometimes include second or third generation immigrants if their immigrant status was central to the narrative.
  • Gender Roles/Gender Bias/Nonbinary
  • Family Structures
    • Any family represented that was not a “nuclear” heterosexual married couple with biological children
  • Adoption/Foster Care/*Orphan*
    • I added the term “orphan” to Jensen’s category to include characters whose parents were dead but had not been adopted or placed in foster care. (A significant amount of orphans were depicted in fantasy novels in which their orphan status was not realistically addressed).
  • Homelessness
  • Poverty/Socioeconomic Diversity
  • *Ability Diverse*
    • Karen Jensen uses the term “disability” but I chose to use “ability diverse” because it was one of the prescribed terms in Novelist
  • *Neurodevelopmental Challenges/Neurological Disorders*
  • Mental Health
  • Culture and Religion
  • Body Diversity
  • Incarceration
  • Own Voices
    • Like Karen Jensen, I believe that Own Voices is an important category to include because it adds an important distinction
  • *Diverse Secondary Characters*
    • I decided to add this category because there were many books that mentioned diverse secondary characters but I didn’t feel that they were developed enough to be counted in the overall tally for the main diversity types

*indicate definitions or categories that I made changes to from Jensen’s model

With the results of the diversity audit, I hoped to determine the following:

  • percentage of diversity in each category
  • which categories were most and least represented in the collection
  • which diversity categories were most often featured together in books
  • which diversity categories were most connected with Own Voice authors

Process & Rationale


  • Google Sheets: used for collected data
  • R Studio: used for transforming data
  • Gephi: used for creating network visualization
  • Tableau: used for creating statistical visualization

Transforming Dataset

My first step was to transform my existing dataset into a format that could be used in Gephi. I used R Studio to re-format my dataset so that it would have the required “Source, Target, Type” columns.

Screenshot of uploaded dataset in Gephi

Using Gephi

After the dataset was transformed and cleaned, I uploaded it to Gephi. I played around with different layouts, but decided to use Expansion because my network was so compact. I then ran statistical analyses on Average Weighted Degree, Network Diameter, Modularity, and Average Path Length.

  • Avg. Weighted Degree: 93.333
  • Network Diameter: 2
  • Modularity: 0.142
  • Avg. Path Length: 1.329

I added the sum of titles in each category to the data laboratory so that I could size each node according to the total number of titles in each category. I ranked the edges by weight.

Screenshot of overview screen in Gephi

UX Research Methods

I asked three youth librarians who are knowledgeable about children’s literature and Karen Jensen’s diversity audit process but did not necessarily know anything about visualizations to look at my project. Because my data is meant to be used by youth librarians who are evaluating their collections, I believe that it will be most important for these users to be able to understand my visualizations. I conducted my UX research using the think-aloud method; they looked at the visualizations for 10 minutes and talked through their analysis and questions they have about them. I recorded their commentary.


  • Participant 1: library assistant at independent school
  • Participant 2: library assistant at independent school
  • Participant 3: trainee at NYPL
Network lab visualization with sample dataset

In my previous network lab, I had used a sample of the dataset that included both the book titles and the diversity categories. While I liked being able to see the individual books in the network visualization, I decided the titles would be too distracted on a larger scale. In order to show which categories had the most books, I used the diversity category sums to size the nodes.

UX Feedback

Tableau Dashboard of Final Project

Dashboard Layout

My three UX participants gave me useful feedback. All of them started off the think-aloud activity by first reading the visualization title and the description before looking at the three visualizations. This was what I was hoping they would do, so it showed me that my layout was correct. All of them also immediately understood that the nodes were somehow sized relatively and that the edges represented an “overlapping” or “intersection” of the diversity categories. Participant 1 was able to make the connection between the size of the nodes in the network and the number of titles in each category by looking at the bar chart on the right and confirming her initial assumption. I included the two statistical visualizations on the right to provide context for the network visualization, so I was happy that all three participants started by looking at the network but then moved to the bar chart and packed bubbles in order to better understand the network. All three, however, did not initially realize that the bar chart had a scrolling feature. While I would have like to present the whole chart and not make users scroll, the size of the dashboard prevented me from doing this.

Points for Improvement

Feedback #1

Need more context included in text boxes. Users were confused about size of the collection and what the categories were describing.


I added more information in description, including collection size, and added a text box about Own Voices.

Feedback #2

The white text over smallest light blue nodes was hard to read


I changed the text to light gray, bolded font, and added outline to text.

Feedback #3

It would be helpful to have definitions of diversity categories included in the tooltips when you hover over the bar chart.


I uploaded descriptions as a column in the spreadsheet and was able to include it in the tooltip.

Additional Feedback

Participant 1 thought that the categories described the author instead of the characters in the book, so I clarified the categories by adding the title ” Number of Books with Diverse Character Representation” to the bar chart to emphasize the meaning of the categories.

Participant 3 realized the the packed bubble viz was only labeling three of the bubbles but they weren’t the top 3 percentages, which was confusing. I changed the settings so that it only labeled the maximum percentage, Own Voices, and users could hover over the other ones to get more information. This also helped to emphasize the importance and meaning of Own Voices in the collection because I added the text box description of Own Voices next to the percentage bubble.

Overall Findings

Own Voices as Focal Point

Although I added the Own Voices category almost as an after thought during my data collection as a way to give deference to the books that not only featured diversity but were also written by someone who had experience it firsthand, it ended up being the main narrative of my visualization due to both the physical size of the node in the network as well as the conversations it provoked. All of my participants, who are youth librarians and are familiar with discussions about own voice authors in the library field, were interested to see which diverse character representations were most written by own voice authors. Black/African American characters, Asian characters, and Latinx were the most connected to Own Voices, with Black/African American having the strongest connection. When the participants began looking at which categories had the smallest connection, an interesting question was asked:

Are these categories less connected because fewer authors are writing about their own experiences with mental health, body diversity, and LGBTQAI+ or is it because authors feel less comfortable openly promoting those parts of their identity, as opposed to their race/ethnicity?

Lack of Representation in Collection or in Publishing?

One of the things I and my participants noticed was that there were significantly fewer books by Asian and Latinx Own Voice authors than Black/African American. This led us to question whether my library wasn’t collecting enough of these titles or whether publishers weren’t publishing enough books by these authors. While this is not a question that can be answered through my data, it is a question with larger implications in the publishing world.

Reflections and Future Directions

I initially created the network visualization to look at what diversity topics were being featured together in books, but the inclusion of Own Voices ended up changing the narrative of the visualization. I would be interested to see what my participants focused on if I removed the Own Voices node from the visualization. Would they be able to see more clearly how diverse representations are connected within children’s literature?

Another visualization I hoped to make would use the same dataset but would feature the book titles instead of the diversity categories. I would be interested to see which books feature multiple types of diversity and how some books are related based on the types of diversity they include.

Broader Implications in Children’s Literature

While it was interesting to see the amount of representation that is included in my library, in order for this to have broader implications, I think I would need to use a much larger dataset to create my visualization. The Cooperative Center for Children’s Books at the University of Wisconsin collects data about diversity in children’s publishing by examining most children’s trade books published in the United States. If I used their data to create network and statistical visualizations, I think youth librarians would be able to come up with some really interesting and useful findings about the state of diversity in children’s literature.