Introduction
I found I kind of love to research about human diseases’ related topics, which are really worth to understand and take the precautions. Therefore, for lab 3, I am also trying to design a visualization for human diseases and disorders. Based on the dataset of a network of disorders and diseases that I found at Gephi wiki, here are the two questions I would like to get the answers:
- What is the relationship between diseases, disorders, and genes in the network?
- What disease or disorder has the biggest community?
Inspiration
I found the following visualization when searching some related disease network maps, and I think the visualization is based on the same dataset as what I found at Gephi wiki. It gives a very clear picture of the relationship between diseases, disorders, and genes. It classifies the diseases by colors and labels the disease names. Besides, it also visualizes the small groups and put them on the bottom of the picture. I guess the reason why it moves the small groups down to the bottom may be because of the limitation of space or keeping the picture clearer. Furthermore, from the visualization, we could clearly see what disease has the most degrees. The more degrees the disease have, the bigger the dot is. Also, we could interpret that the disease with the bigger dot may have higher possibilities to happen in humans because it has more potential factors (degrees). It is really a good modal and direction that I could do for my own visualization.
Materials
- Gephi wiki – A free resource where you could find network datasets.
- Gephi Software – A software that helps to create network visualization
Methods
1. Find the ideal dataset
I spent a lot of time on searching a dataset that I would have the interest to work on. I tried several resources like CASOS and SANP, but I barely could find an ideal dataset. For most of the time, it was just because I could not understand the dataset itself. Some of them were really complicated to know what the datasets were about. Fortunately, the section about biological networks on Gephi wiki intrigued my interest and in the end, I found my favorite topic—The human disease network.
2. Clean the dataset
I found a type column, which I thought it didn’t give me any sense to classify the diseases, disorders, or genes, so I deleted the dataset directly in the data laboratory on Gephi.
Figure 1. The original dataset
3. Visualize the dataset
I would like to build my visualization like the picture I found, so I first changed the color based on disease types, such as cancer, neurological, and respiratory by choosing the partition of the nodes in the appearance panel. Then, I used the palette to generate the colors I wanted, labeled the disease names, and adjusted the width of edges. The following was what I created initially.
Figure 2. The initial visualization
I also used the layout panel to generate the layout I wanted. I tried different layouts and found Fruchterman Reingold generating the round shape was a nice layout that I want to build. However, I found all the dots were clustered randomly together and looked messy. Therefore, I tried to run the modularity statistics and changed the color again based on modularity class in the appearance panel, so it looked better. To make it more distinct, I also changed the background color to black and used white fonts for labels.
Figure 2. The finished visualization
Interpretation
- To answer my first question—what is the relationship between diseases, disorders, and genes in the network, I had to change the color with the disease types, so the diseases in the same classification would just have the same color. In this way, I could see how different types of diseases are related to each other when hovering to a specific disease. For example, when I hovered to Alzheimer disease, I could see there are five types of diseases, disorders, or genes they have a kind of relationship to the neurological Alzheimer disease.
Figure 3. The diseases related to Alzheimer disease
2. Through applying the modularity class, I could clearly see which community has the densest connections between each disease. Therefore, I think I could interpret the colon cancer group as the largest community in the disease network. The difference between the figure 3 and 4 is that we could see what diseases, disorders, or genes would be related to a specific disease from figure 3; Figure 4 just allows us to understand what diseases, disorders, or genes have dense connections or degrees, and we may interpret the colon cancer as the highest potential disease that human may get because it has the most potential factors (degrees).
Figure 4. The biggest community in the disease network
Reflection
When doing the visualization, I was confused about what the modularity class means even though it helped me have the best visual impact. I couldn’t fully understand what the meaning of effect after I change the color for nodes based on the modularity class. I guessed it just helped to realize what is the biggest the community of the disease network. I hope I could have more time to figure out how to use Gephi to visualize network datasets because I found there are several functions that I haven’t touched yet and not sure how to use them. For the future direction, I would further look at more details about manipulating edges. My visualization of the disease network looks fine, but the edges between nodes are very overlapped and messy. Hope I could make the relationship between diseases much clearer after manipulating the edges.