When & Where Cholera Outbreak Affected the most?


Charts & Graphs, Lab Reports, Visualization

Introduction

Cholera, a waterborne disease caused by the bacterium Vibrio cholerae, has been a persistent global health concern for centuries. Its ability to swiftly transform from a localized outbreak into a full-blown epidemic has left an indelible mark on the annals of public health history. To comprehend the extent of its impact and the progress made in combating this deadly disease over the years, a comprehensive analysis of cholera outbreaks from 1950 to 2016 through data visualization becomes imperative.

Through the process of data visualization, my objective was to gain insights into the time and locations where Cholera had its most significant impact. This endeavor aimed to enhance our comprehension of the pandemic’s diverse trends, enabling us to tailor our healthcare systems to address them more effectively.

While studying this entire data, I had few research questions in my mind as:

1. Where did the highest number of cases occur, and in which location was the death rate the highest?

2. In which year did maximum no of deaths happened?

Upon successfully addressing this question through data visualization, we can uncover valuable patterns, shedding light on the varying ways in which this epidemic disseminated across different regions and the potential underlying causes. We can discern whether it predominantly afflicted densely populated areas, pinpoint the critical transition from an epidemic to a pandemic, and assess the extent of its impact on human societies.

My motivation for studying historical pandemic data, even years after the events, is to gain insights into recurring patterns across different pandemics. This analysis allows for meaningful comparisons with the COVID-19 pandemic, highlighting both commonalities and distinctions between these health crises. Notably, densely populated regions often bear the brunt of pandemics, yet disparities in data emerge due to variations in healthcare quality and the specific policies enacted by individual countries during those particular periods.

Datasets

For this data visualization, I gathered raw data from Kaggle in the CSV file format. The dataset was named as “Cholera Dataset : No. of cases from different countries from 1949” The data has years as time factor. The readings given were from 1949 till 2016. As it focuses on Cholera pandemic, it has geographical data cells as well. The original data had 2490 observation rows. I further imported that data in Tableau Public to work on the visualization.

Process

After a thorough examination of the dataset, I found that it was mostly clean and well-structured. However, there were a few cells with missing values, which I intentionally chose to fill with zeros instead of deleting the corresponding rows.

During my data analysis, I noticed that the dataset was thoughtfully organized into six WHO regions, with further subdivisions into individual countries. This dataset contained information on reported Cholera cases and deaths, from which I calculated the Cholera death rate, a critical metric for understanding the pandemic’s severity.

To enhance the clarity of my data visualizations, I decided to use color coding for the six WHO regions, making it easier for viewers to identify the specific region being discussed. This approach also received valuable feedback from colleagues who suggested that I begin the narrative of my visualizations by focusing on the sequence, starting with the number of registered cases. Additionally, they found that the use of Tree maps for presenting primary information was effective, providing a clear and concise overview of the ten most affected countries in terms of Cholera cases or deaths.

Results

As we delve deeper into the research, I began addressing the research questions. The initial research question, which focuses on where Cholera had the most significant impact, was broken down into finer details. To tackle this, I initiated the process by visualizing the overall recorded cases versus overall deaths spanning from 1949 to 2016. However, it became evident that the answer to this question wasn’t unidimensional. Additional questions emerged, including the origins of the outbreak and the specific regions most affected in different years. To address these inquiries comprehensively, I employed a range of visualization tools, including Tree maps, various graphs, and line diagrams.

To construct these Tree diagrams, I aggregated the total number of cases and deaths for each country, with the WHO region serving as the color parameter. To refine the focus on heavily affected areas, I implemented a filter for countries with 1000 or more cases and 500 or more deaths. Additionally, I incorporated labels indicating the country name alongside the actual count of cases or deaths. This approach allowed for a more comprehensive and detailed understanding of the information presented in the Tree Diagrams.

Comparative tree maps

https://public.tableau.com/app/profile/nivedita.thakurdesai/viz/Choleracases-Dashboard4/Dashboard4?publish=yes

While the previous visualizations helped us gain an overall understanding of the most affected regions and the pandemic’s progression over the years, it’s essential to delve deeper into how this crisis evolved and how healthcare interventions impacted the death rates. To address these aspects, I have created the following line graph.

line diagram of Comparison of  cases vs deaths

https://public.tableau.com/app/profile/nivedita.thakurdesai/viz/Choleracases-Comparisionlinediagram/Sheet5?publish=yes

I created this Tree Map to facilitate an understanding of the Death Rate within each region. It employs color coding and labels to distinguish between various regions, making it straightforward to identify the regions most heavily impacted by Cholera from 1949 to 2016. This visualization provides a clear and concise overview of Cholera’s impact across different WHO regions.

https://public.tableau.com/app/profile/nivedita.thakurdesai/viz/Choleracases-TotalDeathsBarGraph/Sheet1?publish=yes

To consolidate all the findings, I opted for a Bar Graph that displays years on the X-axis and the number of deaths on the Y-axis. The graph incorporates color coding for various WHO regions and provides specific country names and their respective counts when hovering over them. This comprehensive visualization offers an overview of the year with the highest number of deaths or death rate and highlights which region or country was most severely affected.

https://public.tableau.com/app/profile/nivedita.thakurdesai/viz/Choleracases-TotalDeathsBarGraph/Sheet4?publish=yes

In conclusion, my analysis indicates that the years 1952-1956 marked a period when the highest number of Cholera deaths occurred, primarily in the South-East Asian regions. During this time frame, India experienced the most severe impact of the Cholera pandemic compared to any other country. While African countries had a significant spread of the epidemic, the death rate was comparatively lower than in the South-East Asian regions. It can be inferred from the visualizations that factors such as a high population density, limited healthcare facilities, and a lack of vaccines for disease prevention may have contributed to the rapid spread of this pandemic.

Reflections

I found that utilizing Tree maps to convey essential information, such as identifying the region with the highest number of cases or deaths, proved to be effective. These Tree maps incorporated color coding for six distinct regions, but their reliability did not solely rely on colors; they included clear labels and varying border thickness to facilitate differentiation.

In contrast, when it came to displaying more detailed information, the use of bar and line graphs could have been further refined. I had the intention of customizing these diagrams to emphasize specific data points, but encountered challenges in achieving this level of detail.

References

Unsplash. “CHOLERA Pictures | Download Free Images on Unsplash.” Accessed October 3, 2023. https://unsplash.com/s/photos/cholera.

“Kaggle: Your Machine Learning and Data Science Community.” Accessed October 3, 2023. https://www.kaggle.com/datasets/imdevskp/cholera-dataset

“Discover | Tableau Public.” Accessed October 3, 2023. https://public.tableau.com/app/discover.