Nowadays, we live in an era of advanced science and technology, but at the same time, more and more complex diseases are emerging. When I was a little girl, some of my family members died for different cancer at their young age. They all were practitioners with professional medical backgrounds but still suffered from human diseases. I am wondering is cancer related to people’ own genes? Is cancer the largest community of gene-originated diseases? What kinds of human diseases have larger groups in the common genetic origin? Thus, my childhood memory and curiosity allow me to try to design a visualization for a relationship between human disorders and disease genes.
This visualization shows human genome with focusing on some genes are be implicated in various diseases. Human has hundred of thousand genes but about a half of them will cause disease if genes mis-regulation and mutations.
- Gephi wiki : A free datasets. I found my dataset, “Diseasome”, from the biological networks category.
- Gephi Software : A free software that I used to create network visualization.
At the first, I spent a lot of time to find the dataset that I need. After finding the ideal dataset from Gephi wiki, I downloaded it and renamed the column names for importing to Gephi.
After importing to Gephi, the screen showed the random node positions with different colors. I aim to create clear and beautiful visualizations in my project so I tried several layouts and finally decided to use “Force Atlas 2”, “Fruchterman-Reingold”, and “YifanHu Multilevel.”
By using the “Statistic” feature, I calculated the average path length of the network and set up node’s colors and sizes by degree. This step helps improve the legibility of my visualization. In the end, I also back to the “Layout” function and use “Adjust label” to prevent the texts of labels from overlapping with each other.
In order to answer my first question “Is cancer related to people’ own genes?”, I used Gephi to create my first visualization with only 2 colors to see the relationship (Figure. 1.1 & 1.2).
The pink color of the nodes represents the type of gene, and the green color of the nodes shows the type of disease. Through these images, people are able to know the area of the “Colon cancer” has the highest density than others, and there are also different cancers around. In the highest density area, it contains more pink color than other area as well. Therefore, combining with the inspiration of this study, we might assume that those human diseases in this area are highly related to mis-regulation and mutations of genes and genetics.
When people see the image above (Figure. 2), they are able to know “Colon cancer”, “Leukemia”, and “Deafness” having larger group than others by measuring the text label sizes. Orange color that represents a cancer group also has the largest community of the human disease.
At the beginning, my dataset was too large to run in my computer. Especially, when I applied the “Force Atlas 2” layout. It took a long time to run the data and ended up shutting down. This is my first time to realize how big my dataset is. In addition, I found that the final visualization will change to different appearances if I applied another layout to the original one. I am so curious about the algorithm that behind each layout. In the future, I would like to doing some research on that and create a better look on Gephi.