Introduction
For the past three months, I have been conducting a diversity audit of Brooklyn Heights Montessori School’s library collection. Using Karen Jensen’s Diversity Audit Outline, I am examining each book in the library’s Fiction collection to determine how much of the collection includes diverse experiences and representations. Examples of the diversity types include: ability diversity, socioeconomic diversity, body diversity as well as race/ethnicity diversity and cultural diversity. I am also recording whether books featuring diverse characters were written by authors who are writing about their own experiences (the literary world refers to these authors as “Own Voices”). While my main objective for this audit is to calculate what percentage of the collection features diverse characters as well as determine which types of diversity are under-represented in order to help with future collection development for my library, I was excited to see if a network visualization would help me to see connections between the books and types of diversity that I hadn’t originally noticed or recognized.
Discussion/Critique of Visualizations
Visualization 1
When looking at network visualizations, I found that the ones that were color coded and used node sizing were the most intuitive. Visualization 1 shows one person’s Facebook network. The orange signifies more friends and blue signifies fewer friends. The larger nodes show a higher connection between friends, using a measure called betweenness centrality. While I think both the color coding and sizing are effective tools, this designer’s choice of colors makes it hard to see the varying degrees. I think a gradient of one color with the darker tint used to signify more friends and a lighter tint to signify less friends would have been more effective.
Visualization 2
I think Visualization 2 does a better job at showing a person’s facebook network because the nodes are color-coded by type of friend. This helps the user to see how the person’s friends from different parts of her life are connected (or not connected). The user can clearly see that many of the designer’s friends from college are connected in some way and that quite a few of them have high connections with each other. Friends from home and friends made through jobs are not connected to each other at all, and there are only a few connections between college and graduate school friends. The designer based the node sizes on degree centrality and grouped them by modularity. Additionally, they included labels that provide context.
Materials
Software:
- Gephi
- R Studio
- Excel
Datasets:
Methods/Process
Transforming Dataset
My first step was to transform my existing dataset (a sample with ~ 100 books) into a format that could be used in Gephi. I used R Studio to reformat my dataset so that it would have the required “Source, Target, Type” columns. Unfortunately, I did not separate my dataset by categories, so the transformed dataset has books and diversity types mixed together. This ended up becoming a problem when I was visualizing the network.
Because there were many categories that didn’t have connections, I needed to remove these blank connections. I cleaned up the data using the “Filter” feature on Excel.
Using Gephi
After the dataset was transformed and cleaned, I uploaded it to Gephi. I played around with different layouts, but decided to use Force Atlas because it allowed me to make the gravity stronger so that my network would be more compact. I then ran statistical analyses on Average Degree, Modularity, and Average Path Length. Figure 1 shows my network draft in Gephi.
I color-coded the networks by the weight of connections, and Figure 2 clearly shows I strong connection between books featuring Black/African American characters and books written by Own Voices authors. There are also connections between books featuring characters dealing with mental health issues and adoption and foster care as well as books with unconventional family structures.
While I liked that the nodes and edges were color-coded by degrees of connection, the dataset I had uploaded made it hard to differentiate between types of diversity and book titles. In order to make these categories clearer, I decided to color-code by category instead of connection. The diversity types are blue nodes and the book titles are pink nodes. This allowed me to see which diversity types were most highly featured in books. Because my sample dataset is so small, users can easily see which books diversity.
Results and Reflection
Creating Connections Between Diversity Types
I think that Gephi is a great way to examine this data in a new way. My primary objective when I began collecting this data was to determine the extent of diversity in our school’s library collection; however, by examining the connectedness between these diversity types and titles, I think youth librarians will be able to better understand how these types of diversity are related in their collections. For example, the network clearly shows that a many books featuring Black/African American characters are written by Own Voices authors, but books about refugees and new immigrants are usually not. This is important for librarians to consider when determining the authenticity of the stories being presented about these refugees and new immigrants as well as a possible red flag for potential biases surrounded refugees and new immigrants. I think it is also interesting to see which books include multiple types of diversity. This network shows that there are many books that feature characters from multiple ethnic and racial backgrounds.
Improving Dataset for Large Scale Network
While creating this network, I realized that I need to separate the data by diversity types and book titles in order to more effectively see the connections on a larger scale for my complete dataset will have around 1500 records. By separating the categories, I will be able to see more clearly which books have the most types of diversity and which diversity types are best connected amongst books.