Cervical cancer screenings and mortality

Introduction

The word vaccination has become more frequently common since 2020. While the COVID-19 vaccine is unquestionably urgent, it is a definite reminder of the many other preventable illnesses with vaccines readily available.

In 2017, cervical cancer was responsible for an estimated 4000 deaths in the United States. However, cervical cancer screening techniques have proven considerably successful in detecting human papillomavirus (HPV), which if left untreated, is the leading cause of cervical cancer. Additionally, the implementation of the HPV vaccine in 2006 now prevents cervical cancer cells from forming by 70-90%.

In order to screen for cervical cancer, the American Center Society recommends two methods:

Pap test to detect abnormal cell changes that may act as a precursor for cervical cancer
HPV test to detect human papillomavirus, which is responsible for cell changes

However, screenings and vaccinations are not readily accessible for those who are uninsured. Despite the increased access to free or low-cost pap smears, women regularly go without pap smears as this information isn’t available to those who don’t know where to look.

For this project, I knew I wanted to focus on “silent killers” in the medical world. While I initially wanted to explore the effect of UV on skin cancer, I felt that this topic was important but already too explored, as it is well established that the majority of skin cancer affects fair skin-toned people.

Exploring cervical cancer disparities and how lack of insurance relates to screenings and mortality outcomes across the US became my preferred topic to unfold a story from. My final areas of interest included:

How does insurance vary between states?
Is there a relationship between insurance and screening prevalence?
Are there hotspots for mortality by cervical cancer?

Process

Data sourcing and cleaning

The Centers for Disease Control and Prevention has a searchable database specifically for chronic disease and health promotion data and indicators. PLACES is a project that leverages local data to better inform policymakers and nonprofits of the health status of residents. PLACES has replaced the 500 Cities Project, which only focused on data relevant to the largest 500 cities in the United States. While I had originally used 500 Cities Project data, I opted to scrap it for PLACES data to allow better connections with the death-related data I sourced elsewhere. The PLACES data consists of 1.5 million rows of city and state-level breakdowns of multiple preventions, health outcomes, and behaviors by crude and age-adjusted prevalences. This was tricky to work with, but luckily the CDC allows data to be filtered before downloading as a CSV file.

Before exporting the data, I filtered out everything that was not related to insurance or cervical screenings. Using age-adjusted prevalences (%) instead of crude figures was also a helpful decision to account for age differences among city populations. For example, since cervical screenings are not necessary for those over 65, cities that had particularly older populations would not skew the outcomes with the use of age-adjusted rates.

Filtering from 1.5 million to 56,968 rows

The screening-related measure was described as the age-adjusted rate of cervical cancer screening among adult women aged 21-65 years in 2018. This included women who had received a screening within the past 5 years.

A second dataset was sourced for cervical cancer-related deaths. This dataset was much less granular, with area by the state only. The data included both death count and age-adjusted rate. I chose to use the age-adjusted rate to coincide with the PLACES rates. Data was suppressed for several states, so I planned to account for this during the development of the visualization. The final dataset was a US shapefile of state boundaries in order to aggregate information at a later stage.

Importing into Carto

Carto is a powerful spatial analysis tool that allows users to build maps without substantial previous coding experience. While originally experimenting in Tableau, I found that a map alone would be the most powerful visual communication for the dataset and that several dimensions could be integrated and interacted with to provide a more meaningful story.

I began by importing the screening/uninsured data set and generated a map from it. I ran Geocoding on the screening/uninsured set and used Administrative Regions to geocode the cities. However, this provided inaccurate mapping as several cities in different states have the same name (see right).

This ambiguity was corrected by using longitude and latitude data rather than city names. The column including both was split in Google Sheets using =LEFT and =MID.

Leveraging latitude and longitude to define cities more accurately

The state-wide death data sourced from National Cancer Institute was added as a new layer, as well as the state shapefiles.

UX testing

Rather than conduct UX research in the final stages, I chose to incorporate it early on to allow for reiteration. Two participants were asked a mix of open- and close-ended questions in order to review the initial map. Follow up questions were asked as necessary to draw further explanation for the user’s rationale. Initial questions are as follows:

What is your initial understanding of what this map represents?
What type of location data are the points based on?
What do the point colors represent? How about the sizing?
What would be your interpretation of a point with X color and Y size?
How is this different from what the state colors represent?
What inferences can you draw about the differences between state X and state Y?

Two participants were selected based on their relevance of experience:

An obstetrician-gynecologist who provides cervical screenings to women.
A psychiatrist with past experience conducting research on potential interventions for cervical cancer disparities within communities.

Rationale + Insights

This section is organized by the rationale for original design choices, UX research findings, the rationale for revised design choices, and final visualization findings.

Original design rationale (Pre-research)

The city-level data was styled using point size and point color based on value, which provided a color and size gradient. This would allow for the dimensions of uninsured and screened women to be defined within a single point.

The mortality dataset only provided state-wide rates; I was unable to find a city-level list of cervical cancer related mortalities. Several states also had data suppressed, which almost led me to discard the whole dataset. However, it felt important to include death rates as substituting it for a dataset of cervical cancer rates instead would not include the population of those uninsured. Rather than aggregate city data into a state-level, I chose to keep the mortality layer in the background as the suppressed states would have otherwise seriously affected the visualization, making it look incomplete in those areas of suppression.

The initial design leveraged opacity to overlay a cool-toned scale for the rate who have had cervical screenings, with a warm-toned scale representing mortality due to cervical cancer. I chose to represent the riskier or “dangerous” ends of the spectrum as darker, more saturated colors; a high death rate would be true red, while a low level of screenings would be a true blue. My initial rationale was that a city with a low rate of screenings and a high rate of cervical cancer mortality would be represented by a saturated purple point, which would be easily spotted by the viewer. The opposite case, seen as less urgent, would be represented by a de-saturated (almost white) point. This would allow areas with less urgent points to remain in the user’s periphery. Uninsured rates felt best denoted by point size to once again draw viewers to areas in need of attention. This would help inform policymakers and medical professionals of areas with the possible overall need for local or county-wide interventions.

UX findings

General

There was no information to differentiate whether the points were geolocated by county or city without clicking on datapoints to view a pop-up.

The copy “seeking cervical screenings” on the legend and pop-up suggested intention only, and that some women might not have followed through with making and attending appointments.

Users wished they could see overall state rankings without having to click on each city.

Color-specific

The transparency and overlaying of point colors was confusing as participants could not identify some values on the legend (ex: purple or light green points).

Example of how the original high death rate and low screening rate produced the color purple.

The death rate colors were confusing because the states where data was suppressed appeared to have the minimum of deaths on the legend (light yellow).

Original colors of states where Death rate and count were suppressed.

Users were attempting to locate % Uninsured based on color as the legend used green to present the size scale.

Original ininsured point size legend using green.

Darker colors accurately signified hotspots of concern to users. Average rates on the legend were also easy to spot.

Users thought state-level colors were based on death count and not rate, as state-level pop-ups included this information.

Alaska and Hawaii were excluded from consideration, as most users tended to zoom into the contiguous states rather than view the country as a whole.

Final design rationale (Post-research)

*Original vs Revised legend *differences in rates between legends are due to a data correction after original screenshots were taken*

From the UX research, it was clear that the legend copy needed major adjustment, as well as the color scales of the map itself. To arrive at the final legend iteration on the right, I took into account the UX research insights and problems I anticipated arising. I implemented the following major design changes:

Provided increased contrast between point colors and blue ocean on the background map by changing the color values of screening data to a red-yellow scale.

Included terminology to discern city- and state-level data with the legend.
Change screening copy from “% Seeking Cervical Screenings” to Population Recently Screened.
Removed color entirely from % uninsured to discourage viewers from searching for matching point colors on the map.

Changed value scale for death rate to grey-scale to create layer contrast by saturation rather than hue, and removed the death count from the pop-up to maintain focus on the death rate (however this did take away the description of Data Suppressed values on the pop-up)
Added widgets to allow state data to be aggregated numerically; I wanted to keep this information separate from the map as I believe local data is important for policy change.

The final visualization can be seen below or accessed here:

Visualization findings

Overall, the visualization was able to communicate a story to me and the users of the relationship between screenings, insurance, and death. Texas had the highest rate of 29.7% people uninsured, which is over 2x the amount of the national average of 14.9%. Texas also has a lower than average rate of cervical screenings of 81.2% and a higher than average age-adjusted death rate of 3 per 100,000. Out of all cities in the US, those with the highest lack of insurance (50.2-62.1%) were remarkably all located at the border of Texas and Mexico, with the exception of Immokalee, Florida.

An insight may be drawn about lack of insurance correlating with a tendency to avoid screenings, therefore maximizing the potential of death from advanced cervical cancer.

Texas was also an interesting case as the border with Mexico showed consistently lower rates of screenings. Many of these populations are a majority of Hispanic or Latino, such as El Cenizo at 99% Hispanic, with a city population of just 3000 people. Poverty levels are over half within many of these rural communities, lending an undeniable relationship between income and insurance.

Regional differences were also striking. The southeast compared to the northeast showed stark differences in insurance rates as well. I provide further information on how these regional findings can be elaborated in the following section.

Recommendations for further development

While I think the map communicates what I intended, there are still many gaps to fill. Further data on cervical cancer rates by state could be incorporated; however, I originally didn’t use this data as those without insurance might not be accounted for in these statistics, and uninsured women could have cervical cancer without knowing.

This offers a compelling opportunity for multiple further inspections. This visualization could be further layered (with a premium Carto account) even more so as to explore a relationship between the median annual income level of cities and insurance levels.

More interestingly, does state abstinence education coincide with a decrease in rates of cervical screenings? In retrospect, this is the story I would have liked to depict further.

HPV causes 91% of cervical cancers. Safe-sex education that informs students of HPV and the vaccine has a proven effect on increasing HPV vaccination rates.

Therefore, the abstinence-only states lacking HPV information, such as Texas, Florida, and Mississippi had among the lowest rate of women seeking cervical cancer screenings, and highest rates of death by cervical cancer, as shown below.

Abstinence only education states in 2021 (Kaiser Family Foundation, 2021)

Region of dark states representing high rate of death by cervical cancer.

The relationship is interesting, as abstinence-only education does not provide students with information related to HPV nor the importance of future cervical screenings. As expected, those states that reject Title V funding have high vaccination rates as well.

This visualization could be taken in this direction toward sex education and death by cervical cancer. If safe-sex education is de-emphasized to students, there’s less knowledge available about the availability of HPV vaccines to protect themselves and future partners from HPV, the leading cause of cervical cancer. This could prove to be a compelling visualization, and users would not face feeling initially overwhelmed by the large quantity of city-level information.

Information Visualization

Student work at the School of Information, Pratt Institute