Gephi lab report by Kyle Palermo, July 7, 2019.
Campaign contributions are largely public record in the United States and a variety of web-based tools allow us to see lists of donors/contributions to various candidates and some basic trends of where campaign cash is coming in from (see the FEC data portal or OpenSecrets, for example). Macro level trends are more elusive and I was interested in pulling a larger dataset together and exploring relationships among major donors in the 2020 general election.
Gephi is a free, open-source tool for network visualization published by a group of French developers. It accepts data formatted as “nodes” (points in a network) and “edges” (lines connecting those points) and converts them into a two-axis network revealing/visualizing relationships among data points. It provides additional visualization functionality including tying node size, edges, and labels to data variables and a handful of pre-programmed layout algorithms.
Before beginning my visualization, I reviewed the Gephi quick start guide and examples of network visualization at Visual Complexity. Halfway through the process I consulted a guide at Dabbling with Data which helped me better understand how to format data for Gephi.
Results and Discussion
Using the FEC campaign-finance data portal linked to above, I exported $100 and up donors to both the Biden and Trump campaigns for the 2017-18 and 2019-20 giving cycles. I then exported $5,000 and up contributors to the largest donor organization to each candidate. (I chose both the $100 and $5,000 donation cutoff somewhat arbitrarily to avoid the FEC’s 500,000 row cap on filtered exports, so the narrative is incomplete.)
To prepare the data for Gephi, I combined these four data sets as a single Tableau union and summed contributions for each unique donor to arrive at a filtered set of contributors with net giving above $10,000, making my data set more manageable and focusing it on high-dollar activity. I then exported four new columns (donor/org name, donor type, recipient name, and gift amount) and used Google Sheets to convert my 2,017 rows into a Gephi-friendly nodes/edges format. My nodes table assigns a unique ID for each donor, the display label associated with each ID, the cumulative giving for the race, and the type of entity (e.g., political action committee versus individual). My edges table includes a row for each contribution with the ID of the donor and the ID of the recipient to which the node’s edge should point.
Formatting my data properly included some trial and error, but after that things were largely downhill from a complexity standpoint. Gephi automatically generates a skeleton of your network, with uniform black nodes and edges arranged unintentionally in a canvas. I experimented with different layout algorithms and found that the Yifan Hu layout provided a nice radial effect around each major node. I ran the Noverlap algorithm on top of the Yifan Hu to help make the individual nodes more visible, set nodes and labels to resize based on dollar amounts, and tied edge/node colors to distinct org types. I made additional tweaks to typefaces and appearance in the preview tab, exported to SVG, re-added the background color in Adobe Illustrator (which Gephi drops during export), added a title, and exported again as a final image.
My Gephi network visualization reveals some interesting patterns, for instance a dense cluster of donors to pro-Biden donation platform ActBlue, and a much less dense cluster surrounding pro-Trump platform WinRed. This may imply that Biden enjoys a broader base of support than Trump; that Trump’s donor base is more bottom-heavy, relying on low-dollar donors that were filtered out by my $5,000 + cutoff; or something else entirely. I am hesitant to draw conclusions with incomplete data and a stronger future iteration would require me to include all giving in my initial data pull. Note also that a handful of seemingly bipartisan donors just happen to have very common names–hinting at a gap in my data cleanup.
Gephi does require more initial data prep than other tools that I’m accustomed to but, now that I understand what it wants as far as inputs, I could invest more time building/cleaning a large data set with confidence that it will all go smoothly once it’s moved into Gephi. Also, this visualization would be more meaningful with annotations, such as a key explaining the node colors, and Gephi might be improved by adding some basic key/legend/chart options, which I believe current users must manually add in a second program. Still, despite the limitations of my data set and absence of post-Gephi design work, this visualization does paint an interesting picture of the direction that money flows in presidential elections and who some of the major bettors are in the race.