Stack Overflow Tags – NETWORK ANALYSIS


Lab Reports, Networks, Visualization
Developer Story Concept by Stack Overflow

Introduction

Stack Overflow is a question and answer based online community created in 2008. It aims at helping developers and programmers across the globe to disseminate their programming knowledge and build their careers.

A developer story is a way to showcase the aspect of a developer’s identity. It aims to provide an entire picture of who you are. In order to understand how technologies are related to each other, several developer stories were analyzed by Stack Overflow where developers make use of ‘tags’.

In order to visualize this relationship better, I made use of Gephi to create a Network for the same where the technologies were represented as nodes while the relationship between them was represented as edges.

Inspiration

To get started with Gephi, I looked at the sample Les Miserables network which is also a sample available in the software itself. To explore the basic functionalities, I watched the tutorial video available for the same on YouTube which covered the concepts of layouts, editing of nodes and edges as well as some basic statistics.

What I specifically liked about this network was the use of the Force Atlas layout to create small clusters of closely related nodes.

The Tutorial video referred to for inspiration

Choosing a topic

Having completed my bachelor’s in computer engineering, I have relied on Stack Overflow heavily for a lot of debugging problems. I thought it was an interesting idea to analyze and visualize the technologies used by fellow developers.

A snippet of the Developer Story. Source: Stack Overflow Meta

Tools

Kaggle

After choosing a topic, I looked for datasets to work on using Kaggle, an online community for data scientists and machine learning practitioners. I came across the CSV file for the Stack Overflow Tag Network.

MS Excel

In order to import the dataset, I made use of MS Excel. The data comprised of 115 nodes and 490 edges and did not require any refinement.

The dataset

Gephi

After importing the dataset, a network was created using Gephi.

Methodology

After importing the available dataset as a spreadsheet in the nodes table, I had 115 nodes and 490 edges. The initial graph looked as shown below:

Initial Graph

Choosing the layout

For the layout, I decided to choose Force Atlas to have the closely associated nodes clustered together, setting the Repulsion Strength to 200 and the Attraction Strength to 5.

Appearance

For the appearance, I decided to alter the size and color of the nodes based on the number of edges emerging from or directed towards it.

I chose the color palette in such a way that the variance can be understood properly while at the same time choose the colors in such a way that the darker colored labels can be easily seen without much strain on the eyes.

Final Network

Analysis

Once the rendering of the network was complete, the nodes were split into smaller clusters. For example, cluster for web development was created where the different web development technologies like CSS, HTML, JavaScript, MySQL, jQuery were listed. jQuery had the most connections out of all of them.

Reflections

During the project, I initially found it difficult to navigate my way around the software and needed the assistance of a tutorial to understand its functionalities. However, importing data for the same is extremely easy and intuitive.

Going forward in the field of Network Analysis, I wish to explore the field of Social Network Analysis.

Sources

Gephi
Wikipedia
Stack Overflow
YouTube