The Consolidation of Book Publishing in the US: A Network Graph Study

Final Projects, Visualization
Credit: Jessica Ruscello via Unsplash


In recent years there have been some disruptions and changes in the book publishing industry. Amazon became the dominant online retailer for print, ebooks, and audio books and has a self-publishing vertical as well. Amazon collects proprietary customer data that is not shared with publishers who sell through their site.

Traditional publishers’ response to Amazon’s power has been to consolidate via mergers and acquisitions, which is the focus of my analysis. I will look at how the trade book publishing industry changed over time, focusing on recent years. I will also assess what the (potential) effect of these changes could mean.

According to Elizabeth Harris, writing for The New York Times, “There are general interest publishers and academic publishers, big companies and small ones, as well as people who self-publish. All that makes it difficult to get an accurate read on how dominant any one player is.”

In my years of working in the industry, I haven’t come across this type of analysis before.


In terms of design, I was most inspired by the following visualization from the Network Analysis & Visualization class lecture slides. I like that both size and color are used to distinguish different groups and relationships. Seeing and analyzing visualizations over time could identify important trends.

Slide 27 from Week 8 – Network Analysis & Visualization class lecture slides


The software used for this project is called Gephi, which is free and open-source for Windows, Mac, and Linux. It is used to calculate network statistics, detect clusters, and filter, style, and label. Plugins are available from open-source developers which provide additional features. Once the visualizations are finalized, they can be exported as PDF, PNG, or SVG.

For the scope of publishers and imprints I included in the analysis, I used a list of the top 20 trade book publishers in the United States based on units sold. The list was published by Publishers Weekly in 2017 and was based on purchases made through outlets tracked by NPD BookScan in 2016.

Here is the table of publishers by rank and the URL where I found most of the imprint names:

1Penguin Random House
3Simon & Schuster
4Hachette Book Group
7Disney Publishing Worldwide
8Houghton Mifflin Harcourt
11John Wiley and Sons
15W.W. Norton
19B&H Publishing
20Tyndale House

I manually generated the CSV files to import by researching lists of the major US publishers and associated imprints. Below is a sample of the template. Since I decided to work in small multiples, in total I used four templates—one for each time period that illustrates major mergers and acquisitions within the publishing industry.

Methods & Process

I started with a proof of concept draft with a few big publishers before gathering all of the data because I wanted to make sure this direction was worth pursuing.


Shows Random House, Penguin, and Simon & Schuster as separate entities.


Post Penguin-Random House merger.


Post Penguin Random House acquisition of Simon & Schuster (pending approval).

After sharing these visualizations in class and with a couple publishing colleagues, the feedback was mostly that the central node of each group (e.g. the “owner” of each imprint) needs to be much more prominent to better understand the clusters. Also, the line weight and color for the edges also needs to be heavier for better contrast.

My colleagues were familiar with the imprints and structures, but couldn’t pinpoint the exact “event” that the visualization refers to. This insight made it clear that I would need labels and some context for each visualization since they aren’t easy to comprehend entirely on their own.


Pre-2012: Post-recession challenges & Amazon’s increasing power

Before 2012, the major US publishers were known as The Big Six and included: Random House, Penguin, Hachette, Simon & Schuster, HarperCollins, & Macmillan.

After the 2008 recession, consolidation became more rapid and strategic as Amazon became an increasingly powerful force in the industry.

2012-2016 Penguin-Random House Merger

The Random House–Penguin merger agreement was signed October 29, 2012, and the deal was officially completed July 1, 2013 (NY Times). 

2016-2020 Hachette Acquires Perseus and Worthy

The Perseus acquisition was in 2016 and Worthy was acquired in 2018. In this case I decided to keep them in one visualization since both acquisitions were relatively small.

Full disclosure: I worked at Hachette from 2016–2021, which also inspired my interest in this topic!

Post-2020 Penguin Random House to Buy Simon & Schuster (Pending Approval) & Houghton Mifflin Harcourt Sold to HarperCollins

In late November, 2020, ViacomCBS has agreed to sell Simon & Schuster to Penguin Random House for more than $2 billion in a deal that will create the first megapublisher (NY Times).

Pending approval by regulators, the deal is expected to be completed some time this year. (Publishers Weekly)

Additionally, in March 2021, Houghton Mifflin Harcourt entered Into Definitive Agreement to Divest HMH Books & Media Consumer Publishing Business for $349 Million to HarperCollins.

Both of these changes are reflected in the final visualization—the near future of the publishing world.


The Risks of a Shrinking Publishing Industry

For reference, here are the first and last visualizations side-by-side to show just how much change has occurred over the past eight years. There are still plenty more publishers on the fringes, though, and it’s unclear when and if the consolidation will stop.



Antitrust Concerns

In January 2021, The Authors Guild, along with five other writers’ groups and the nonprofit Open Markets Institute, sent a letter to the Department of Justice asking the government to block Penguin Random House’s pending acquisition of Simon & Schuster. The letter states that the combination of PRH and S&S, currently the largest and third largest trade book publishers in the US, will diminish the competition for authors’ manuscripts and drive down author advances. When author advances are kept low by these economic forces, it influences who can afford to publish a book in an industry that has historically been fairly homogenous. The letter further states that, since the publishing industry plays an important role in the protection of free speech and the dissemination of ideas, “antimonopoly enforcement is of the most importance.”

Although imprints claim “editorial independence,” it’s unsettling that the majority of US publishers would report to the same corporate owner (based in Germany). And that same corporate owner, Bertelsmann, also owns a sizable share of the print manufacturing industry

As publishing becomes more consolidated, and the retailing of books becomes more consolidated, there are fewer and fewer of us who decide what books get wide distribution in America. ​And the decision​s​​ of what to publish, or ​not publish, ​or​ ​very occasionally what book to pull, we make as ​individuals​—each one of us distinct from the other.

– John Sargent, (former) CEO of Macmillan, 2017 PEN Publisher Honoree’s Acceptance Speech


After my initial lab using Gephi, I was a little nervous because it’s a complicated application. But this time around it was more fun and I was able to advance my understanding of some of the statistics and styling options with a larger data set. Gathering and formatting data manually was the hardest and most time-consuming part of the project, but it was helpful to cap the list to the 20 largest publishers and their imprints. I tried to get founding dates for as many imprints as possible, but a lot of times I wasn’t able to find them. I did my best to accurately portray imprints in existence during a given timeframe in the visualizations. I also tried to model the relationships correctly, but sometimes mergers and acquisitions are complicated. Nonetheless, these issues are likely in a small minority of records, so the trends are still clear. I’m glad I decided to do this project since it’s a topic I’m interested in and I haven’t seen similar visualizations like this before.

Some future directions for this project could include:

  1. Animation – I don’t think I have the skill for this yet, but it worked just as well to take the small multiples approach and use still images since I analyzed changes over an eight year period. It’s also easier to read the imprints in stills.
  2. Include smaller publishers – For the sake of time, I had to keep the list reasonable, but there are so many small publishers and they are usually distributed by one of the publishers on the top 20 list. It would be interesting to include independent publishers, even as a cohort, for comparison.
  3. It would be interesting if annual imprint book count, sales units, or earnings could be used to measure scale in addition to simply imprint count because the number of annual publications (and therefore sales and earnings) can vary widely among imprints.
  4. With consolidation, oftentimes “shared service” positions get eliminated as a cost-saving “benefit.” Therefore, it would also be interesting to view this trend from a labor perspective by analyzing change in employee count before and after mergers or acquisitions.

Of course, in the grand scheme of things, many people would consider this consolidation trend unimportant or even expected. However, if there’s anything that the past several years have taught us about who controls the distribution of information (or misinformation), the effects in the long term of even fewer people gatekeeping publishing could be risky and we should be wary.