Introduction
Stack Overflow is a question and answer based online community created in 2008. It aims at helping developers and programmers across the globe to disseminate their programming knowledge and build their careers.
A developer story is a way to showcase the aspect of a developer’s identity. It aims to provide an entire picture of who you are. In order to understand how technologies are related to each other, several developer stories were analyzed by Stack Overflow where developers make use of ‘tags’.
In order to visualize this relationship better, I made use of Gephi to create a Network for the same where the technologies were represented as nodes while the relationship between them was represented as edges.
Inspiration
To get started with Gephi, I looked at the sample Les Miserables network which is also a sample available in the software itself. To explore the basic functionalities, I watched the tutorial video available for the same on YouTube which covered the concepts of layouts, editing of nodes and edges as well as some basic statistics.
What I specifically liked about this network was the use of the Force Atlas layout to create small clusters of closely related nodes.
Choosing a topic
Having completed my bachelor’s in computer engineering, I have relied on Stack Overflow heavily for a lot of debugging problems. I thought it was an interesting idea to analyze and visualize the technologies used by fellow developers.
Tools
Kaggle
After choosing a topic, I looked for datasets to work on using Kaggle, an online community for data scientists and machine learning practitioners. I came across the CSV file for the Stack Overflow Tag Network.
MS Excel
In order to import the dataset, I made use of MS Excel. The data comprised of 115 nodes and 490 edges and did not require any refinement.
Gephi
After importing the dataset, a network was created using Gephi.
Methodology
After importing the available dataset as a spreadsheet in the nodes table, I had 115 nodes and 490 edges. The initial graph looked as shown below:
Choosing the layout
For the layout, I decided to choose Force Atlas to have the closely associated nodes clustered together, setting the Repulsion Strength to 200 and the Attraction Strength to 5.
Appearance
For the appearance, I decided to alter the size and color of the nodes based on the number of edges emerging from or directed towards it.
I chose the color palette in such a way that the variance can be understood properly while at the same time choose the colors in such a way that the darker colored labels can be easily seen without much strain on the eyes.
Analysis
Once the rendering of the network was complete, the nodes were split into smaller clusters. For example, cluster for web development was created where the different web development technologies like CSS, HTML, JavaScript, MySQL, jQuery were listed. jQuery had the most connections out of all of them.
Reflections
During the project, I initially found it difficult to navigate my way around the software and needed the assistance of a tutorial to understand its functionalities. However, importing data for the same is extremely easy and intuitive.
Going forward in the field of Network Analysis, I wish to explore the field of Social Network Analysis.