For my Gephi lab, I opted to create my own network dataset that would examine relationships between Boards of Trustees at several top colleges and universities. I was interested to see whether any Boards of Trustees share members, and more specifically, whether Boards of Trustees show overlap based on the companies their members represent. As an additional area for analysis, I thought it would be worthwhile to see whether any “types” of companies (finance, law firms, technology Companies, etc) tend to dominate membership on Boards of Trustees. Some further questions I had were: Which schools have the most co-occurrences of either individual members, or companies represented by individual members? What companies act as the most significant “bridges” between distinct Board of Trustees? How are companies in the finance sector represented on Boards of Trustees, and is their a significant amount of Board of Trustees overlap within the finance sector? Most importantly: where does the power lie?
A deeper analysis may consider how the companies represented on university Boards of Trustees affect fiscal and political decisions each school makes. For example, how does a school’s Board of Trustees membership affect institutional decisions such as divesting from fossil fuels, or funneling money into STEM facilities? The latter questions likely will not be covered in this lab report, but but further efforts may help develop an understanding of how a school’s Board of Trustees membership affects students, staff, and faculty at an institution.
My work here was largely inspired by a project called They Rule, which visualizes many of the relationships of the US Ruling Class. According to the website, the project, “takes as its focus the boards of some of the most powerful U.S. companies, which share many of the same directors. Some individuals sit on 5, 6 or 7 of the top 1000 companies. It allows users to browse through these interlocking directories and run searches on the boards and companies…They Rule is a starting point for research about these powerful individuals and corporations.” While “They Rule” is a multi-modal visualization that displays both companies and individuals as nodes, I decided I would limit my focus to either companies or individuals. As referenced in our class lecture, multi-modal visualizations can create unnecessary barriers between nodes – which is something I hoped to avoid because my dataset is relatively small.
The Data
Collecting my data was a straightforward process. Most colleges and unversities list Board of Trustees members, their degrees, and the companies they work for on the official school website. I simply copy and pasted Board of Trustees member information from Yale, Princeton, Harvard, Columbia, Stanford, University of Chicago and Amherst College, then organized the data into columns by “Name” (Name of Board member), “Company” (Company they work for), and “School” (Institution where they’re on Board of Trustees). While I was formatting, I noticed that there was little overlap in individual Board members, but a significant amount of “Company” overlap. Thus, I decided that my visualizations would company on the relationships between companies rather than individuals. I formatted the data in Excel, then did further cleaning and transposing in GoogleRefine.
After I made sure the data contained “Source,” “Target,” and “Type” values, I imported my .csv in Gephi as an edge table. I’ll note that I also included the “School” value in my edge table so I could later indicate “School” by edge color. The resulting import showed 3356 Edges and 189 Nodes. I then went into the Gephi-generate Nodes table and added a column called “Type,” which categorizes each company by focus. Because I knew I wanted to color nodes by type, I tried to limit the amount of categories I chose. This ended up being a difficult feat given the variety of companies represented on Boards, however I pared it down the best I could into 10 categories: Finance, Academic, Foundation/Non-Profit, Media & Communications, Technology, Public Sector, Law Firm, Real Estate, Manufacturer. Companies that fell out of these categorical constraints were aggregated into a category called “Other.” Because I was trying to pare down my categories, I should note that this likely presents a source of error within my visualization.
Viz #1 – Visualizing clusters
After ensuring my data was ready to go, I moved to the Overview window and ran “Average Degree,” “Graph Density,” and “Modularity.” Running a modularity of “1.0” resulted in seven modularity classes – which directly corresponded with the amount of schools I analyzed. This makes sense, given that the strength and amount of relationships within each Board are likely binding them into tighter clusters. For my first “Force Atlas 2” graph, I used modularity class to color each node, then used “School” to color the edges. Because modularity class aligned with each school Board of Trustees edge to near perfection, I used the same colors for both the edges and nodes, which was effective for visually distinguishing each cluster. In addition, I sized nodes by “degree.” The automatically generated range worked well to clearly differentiate between high and low degree nodes.
The visualization shows a large blue University of Chicago cluster to the left, which connects with the purple Princeton cluster by way of two large bridge nodes (Citadel Investment and Capital Research). University of Chicago’s cluster is both large and relatively isolated from the rest of the school Boards, which look more interconnected. Here, I noticed that bridge nodes tended to fall into the modularity class of whichever cluster contain larger “degree” nodes. With more time, I’d like to recolor those nodes, or possibly annotate them to acknowledge that company members are present on multiple Boards. According to this visualization, Citadel, Capital, Goldman Sach, Simpson Tacher & Bartlett, U.S. Second Circuit Court, Memorial Sloan Kettering, Google, and Bain Capital are all prominent bridges between the Boards of Trustees, meaning that they have a dominant presence within my group of selected Boards/Schools.
In a similar vein, I thought it may be useful to use rank nodes by color in addition to size to better indicate higher degree nodes. The edges remain colored by school as they were in the previous visualization.
Viz #1 – Visualizing Dominant Company “Types”
Next, I wanted to see which types of companies, if any, are more prevalent in my selected Boards of Trustees. I recolored Nodes by “Type” and removed color from the edges, which resulted in a confusing, carnival-like visualization with too many colors to tell the difference between nodes. Despite that this visualization is saturated with too many colors, I was able to glean several important takeaways. First is that the large University of Chicago node on the left contained a significant amount of finance companies, which is visually evident. U Chicago also contained far fewer Board members from academic institutions than the others schools I analyzed. Similarly, the Stanford cluster in the bottom right corner of the viz contained a large proportion of Board members from finance companies and few from academic institutions.
Viz #4 – Fruchterman Reingold by Modularity Class/School
Overall, I thought the Fruchterman Reingold layout was most effective for visualizing the size of clusters and nodes relative to others. From this visualization, one is able to see that University of Chicago, as a large research institution, has a much larger Board of Trustees than Amherst, a small liberal arts school. While size of a Board obviously impacts the amount of relationships it can have, I think it’s interesting that both small liberal arts schools and large universities share some common ground based on the companies represented on their boards. For example, despite that Amherst is a much smaller school than both Yale and Columbia, all three schools are connected by Bain Capital (Mitt Romney’s company).
Viz #5 – Fruchterman Reingold & YifanHu’s Multilevel as a Second Attempt at Analyzing Board of Trustees by Company Type
Because it is easier for me to analyze clusters in the Fruchterman Reingold layout, I once again decided to try coloring nodes by company type. Taking a different approach, I only colored the top four “types” (barring the “Other”, which aggregated multiple smaller categories), and then made the remainder of the categories white. The resulting visualization affirmed what I had noticed earlier – that companies in the finance sector tend to be the most prevalent on Boards of Trustees, and especially so at University of Chicago, Princeton, Stanford and Columbia. The other company types display trends, but they’re significantly less blaring. For example, it’s clear from this visualization that most Boards of Trustees contain members from academic institutions, which is no surprise given the inherent purpose of a Board of Trustees. Below my Fuchterman Reingold visualization, I tried out a different view with the YifanHu Multilevel layout to see if I could identify any trends about the types of companies that form bridges. Unfortunately, I could not glean much additional analysis from the YifanHu layout.
Conclusion
While I thought it was interesting to identify overlap in companies’ representation on several university Boards of Trustees, I thought the small sample size I chose imposed some limitations on the visual aspect of the project. Next time, I may want to add more school boards to my analysis so my findings reflect a broader picture – there is still so much more to learn! Additionally, I think it would work to my benefit to be more precise with company categorization, or even come up with a different categorization scheme altogether. I thought my imprecise categorization really detracted from the impact of my visualizations, and ultimately fell flat. Perhaps, with more boards to analyze, I could event filter the nodes by company “type,” to see how companies within the same category are connected through university boards.
Another future direction for this project may be to export the visualizations using Sigma.js. I had some difficulties with color and edge visibility, so I opted to refrain from published any of my current visualizations with this plug-in. That said, the plugin would be extremely useful for taking a more in-depth look at the relationships present within my visualizations.