A Network Analysis of Human Diseases


Lab Reports, Networks

Introduction

For the network analysis lab, I chose to focus on a network of human diseases and gene types. I wanted to not only explore the visualization but also to discover specific gene types and their relationship to diseases. I also wanted to understand the links between neurological diseases because many of my relatives were diagnosed with these types of diseases specifically Alzheimer’s and Dementia.

Materials

When I first started this lab, I knew that I wanted to find a dataset that related specifically to human diseases including neurological diseases. Over the past few years, I have had several relatives who were diagnosed with having Alzheimer’s and Dementia. I not only wanted to understand the disease but to also see how it links to various other diseases. I started the project by doing a Google search using the keywords “network human diseases”. I found a network visualization of human diseases from the Exploring Data website which focuses on interactive data visualizations from open source tools.

Fig. 1 Exploring Data network visualization of human diseases

Next, I went to the Gephi Wiki which houses a variety of datasets on GitHub, a development platform. I then chose the Diseasome dataset and downloaded it as a zip file.

Fig. 2 Gephi Wiki page

I then downloaded Gephi, an open-source visualization and exploration software for networks and graphs.

Fig. 3 Gephi website

Process

I started the lab by opening the Diseasome file in Gephi. I did an inspection of the data to make sure there weren’t any issues. I then wanted to take a look at the nodes in the data set so I clicked on the Data Laboratory button and then clicked on Nodes. I saw that there was a label column and type column that specified either the disease or gene type.

Fig. 4 Nodes data table

Next, I began to experiment with the overall visualization of the data set. In the Appearance section, I clicked on Nodes then Size. I wanted to adjust the size of the nodes so that they could stand out more. I clicked on ranking and changed the attribute to “Betweenness Centrality” then adjusted the minimum size to 25 and the maximum size to 400. I then changed the layout of the visualization by clicking on “Force Atlas” under the Layout section. I adjusted the Repulsion strength to “10,000.”

Fig. 5 Appearance and Layout change

I then added the text labels to the visualizations and adjusted the text size for the nodes by clicking on the drop down field at the far right of the bar and changing the Size field to “Node size.”

Fig 6. Text label size adjustment

Next, I took a look at the overall visualization and made adjustments by moving specific nodes around in order to have the text label prominently standout. The following image (Fig. 7) shows the final result of the visualization.

Fig. 7 Final Visualization

I then wanted to see the linked diseases to Alzheimer’s, so I clicked on the respective node. I saw that dementia and amyloidosis were both linked to Alzheimer’s diseases. However, I was surprised to discover that schizophrenia was also linked to Alzheimer’s.

Fig 8. Alzheimer’s disease node

Finally, I wanted to see the overall visualization with a black background similar to the Explore Data visualization. So, I clicked on the background color button and changed it to black.

Fig. 8 Visualization with black background

Reflections

Overall, I found this lab to be quite challenging as well as informative. I was not familiar with Gephi before this lab assignment and found the software to be not particularly user-friendly. It took me several attempts to truly create the visualization that I wanted. I did like the ability to adjust the layout and appearance of the visualization with a variety of properties. I also liked being able to drill down on a particular disease to see what was linked to it and being surprised by these findings.

In the future, I would like to see what other layouts I could use for this dataset. I think it would be interesting to see how I could display the networks of diseases in a variety of ways. I would also like to use Gephi to explore other datasets to see what insights I could derive. I also want to explore other network visualization software particularly those that are more user-friendly and easier to navigate.

References

  1. Exploring Data Website
  2. Gephi Wiki
  3. Gephi