Weak Ties: Network Analysis of Twitch Data


Networks, Visualization

For my network analysis I choose to look at Twitch user data. This data displayed the networks and connections between Twitch users with English language accounts as of 2018. This data highlighted the strength of weak connections between users. Most users had connections to no more than five other users, with 93.65% of users falling into this category. As the number of connections was increased fewer and fewer users had that many connections. The highest number of connections topped out at 420, however only a single user had this many connections. Most Twitch users know only a few other people, very few users have more than 70 connections, with 98% of users having 70 or fewer.

The methodology required a few different steps. First the Twitch data was obtained from Stanford Large Network Dataset Collection and then filtered down to English language only to prevent data overload. This data was then imported into Gephi and the Force Atlas 2 algorithm was run for a period of approximately 10 minutes until the network had been mapped into a slightly more shapely form. However, at this point every connection between users was represented so some filtering was necessary. The density function of Gephi was utilized and after viewing the statistics a number of intervals was selected as to best explain what the data had to say about connections between Twitch users. As the mean was 5, with a standard deviation of 22.73, 5, 10, 28, 70, and 150 degrees were selected as representations of the different levels of connectedness between users.

Below is the a network visualization of Twitch users with at least 5 connections. At this level it is hard to discern distinct groupings due to the pure size of the network.

Twitch users with at least 5
10

The change from 10 to 28 is not very significant, although not imperceptible as we begin to see a number of groupings emerge, a sizeable amount of noise notwithstanding. 28 user connections was selected as it is approximately one standard deviation away from the mean, with a mean of 5.

28

Here is the greatest change, going from 28 connections to 70 connections. With 70 connections we have several distinct groups, centered around a few nodes of users. These users likely use the live-stream functionality of the platform together in order to maximize their audience my reaching new users.

70

One facet missing from the dataset is what users are utilizing their accounts. On Twitch anyone can live-stream themselves, however the vast majority of users do not make use of this function. In the data I worked with, I would venture that somewhere around 30 connections users begin to utilize the streaming function. Streaming is the premier way to gain new connections amongst other users, when users are “live,” the user interface changes so it is much easier for other users to learn more about the user streaming (the streamer) as well as to connect with them via a large follow button.

Thus what we see from the data is a divide with between those with multitudes of connections and those with a smaller network. Twitch is dominated by small communities and streamers who do not enjoy the large audiences of the famous streamers. This structure is further reinforced by the payment distribution between smaller streamers and large streamers. There was a recent data leak from Twitch which showed that 90% of payments made to streamers went to the top 100 streamers.

Twitch users with at least 150 connections