What We Could Know From Food-borne Illness

Final Projects, Visualization


I have found that food-borne illness has not been effectively valued across the region or across countries. Many hospitals have only allergy or infection department to deal with food-borne diseases. The illness records, however, are huge, and largely depends on regions (not only for region density). I did an interview with three relative people and decide the goal of information visualization.

I present 4 related visualizations to talk about the foodborne illness situation in the U.S., then focus on comparing Florida with other states, emphasize 1998 as a foodborne illness outbreak year. After that, I will display one food issue of Florida in 1998. In the end, use network map to depict main bacteria species caused by which food ingredients.

The user testing part was separated in three-time period with three people.


Dataset Source comes from Kaggle.

Academic Research Articles:

  • Food Protection: The Mission May Be Hazardous to Your Health
  • Foodborne Illness Surveillance and Investigation Annual Report, Florida, 1998
  • Mysterious Outbreaks of Gastrointestinal Illness Associated with Burritos Supplied through School Lunch Programs
  • Outbreaks of Gastrointestinal Illness of Unknown Etiology Associated with Eating Burritos — the United States, October 1997-October 1998
  • Vibrio illness in Florida, 1998–2007

Software: Open Refine; Excel; Tableau; Gephi; Sketch.

Rationale for Visual Design

1.Topic Choose

The 1998 year continued to be active for food and waterborne outbreak reporting and investigation. A total of 4,449 foodborne illness complaints were reported to counties in 1998. A total of 315 outbreaks with 3,290 cases were reported, compared to 439 outbreaks and 2,744 cases for 1997, and 305 outbreaks and 2,777 cases for 1996. Investigators were able to laboratory confirm 40 of the outbreaks (including 14 V. vulnificus) associated with 567 cases. Staphylococcus, Norwalk, and Salmonella were identified in the largest percentage of the total reported outbreaks (6% each). Norwalk was identified in the largest percentage of cases in total reported outbreaks (22.2%%) followed by B. cereus (4.6%). Restaurants were the source site in 75% of the outbreaks reported and in 48.8% of the cases. Chicken was the main implicated in outbreak vehicle. The month with the largest percentage of outbreaks reported was March and August. (CDC, 1998)

Food protection as a basic public health issue ranks among the higher environmental health and protection priorities. As a direct impact on people, this public health priority is much higher than for most other environmental health and protection problems. However, public health agencies may have taken such responsibilities for granted and ceased exerting requisite retain practice for food quality protection as an essential public health activity as they turned their attention to other public health, and even health care issues. (Gordon, Larry J, 1994)

Through this information visualization, my poster is to let the public have a preliminary understanding of the source of foodborne illnesses, their geographical distribution, seasonal features and historic event.

2. Goal & Target Group

My goal is enhancing government agencies’ attention and investigation of food-borne diseases, and for the public have a basic understanding of the causes of food-borne diseases.

3. The Choice of Information Visualization Graph Type

I want to tell a story about foodborne illness by various dimensions, so I choose:

Choropleth Map

Choropleth Map is the basic and common map type. My first table shows the total records number in U.S. from 1998 to 2015. My aim for the first table simply shows the different states’ foodborne illness situation (the records number) and the percentage taken of the total, to provide a theoretical basis for further study in Florida.

The advantage of Choropleth Map is it offers users or audiences the familiarity of their own country or state, thereby enhance the interest to know more. Also, since Choropleth Map without complicated overlap information, users could directly understand.

However, since I saw the bad example of misleading election map, The biggest problem with Choropleth maps is the data distribution and geographic size asymmetry. Often large amounts of data are concentrated in densely populated areas, while sparsely populated areas occupy most of the screen space. As a result, the user may misunderstand the data and cannot help the user to accurately distinguish and compare the data values of all the partitions on the map. For my map, I use it since considered about my data do not relate to the area percentage of region population, the main manifestation is the proportion of foodborne illness records.

Stacked Graph

I use my second table to show the recodes number monthly change above six selected states in the U.S. at the same time compare 1998 and 2015. This is a time-series chart, since I just show the overview map of U.S., in this step I want to select few specific regions to compare.

Furthermore, since pre-interview with physicians, I got the information that summer could break out cases of more foodborne illness, so I want to show the monthly change simultaneously. Ironically, the final graph shows spring foodborne illness is more frequent than summer. The reason why I choose Stacked Map is to emphasize Florida as a main food illness outbreak region, no matter in 1998 or 2015, and also illustrate the major change for not only Florida but all U.S. regions.

The flaw in this graph is why I choose these two years and color overlap. If people do not use interactive tableau, click the area of different states, it’s difficult to compare. And I am not sure of if the parallel form of the comparison is valid.

Bubble Chart

My third graph is for telling people the illness caused by which food in Florida in 1998. This Bubble Chart contains two dimensions content. The size of bubbles is based on sum number records of 1998 Florida’s ingredients which caused illness; The color of bubbles depends on the illness number on one records. We call it “outbreak” (An incident in which two or more persons have the same disease, have similar symptoms, or excrete the same pathogens; and there is a time, place, and/or person association between these persons.) (cite) Show a major outbreak of Burrito.

Bubble Chart could illustrate single serious information in two dimensions offers by circle. The limitation for my graph is the two dimension of information need annotation to understand. At first, I did not add circle size describe on the dashboard, my users told me they are confused.

I will consider using scatter plot add quantitative dimension axis in future.


For the public, we need to know the reason why we got sick by foods, which kind of ingredients need to watch out. I find a chart: Foodborne Illness-Causing Organisms In The U.S. This is the information I want to show on the network. More specifically, I process Florida’s foodborne illness food ingredients and bacteria species relationship.

The Usage of Color

I use warm color: red, yellow, orange as the main hue to display information about Florida. Based on serious value to add color value. However, inevitably, states use a lot of contrasting colors. So, look messy.

For Gephi, Nodes color attributed to the degree of network, bacteria species mainly with high connection value, display by grey. Edges color attribute to Edges label(The food ingredients)

Automatically choose high-density ingredients to display various colors and others are grey.

Pre-User Experience Research

The primary User Research of this project is to understand the background of foodborne disease and treatment of different healthcare institutions. I choose 3 different background persons either know the food caused disease treatment process or have personal experience in the U.S.

from the interview I understand the reason of foodborne illness could be various: Most are acute enteritis caused by the use of bad quality food ingredients, diet sleeves lead to hyperlipidemia and food poisoning…Season and location in a large range influence the quantifier of illness people. Also, most of the hospital only have allergy and infection department. Raw seafood, meat is the significant reason for foodborne illness. normally, most can be cured in one week or one month, with relatively low risk and low lethality.

Creation Methods

  • Data Clean:

Open Refine: remove blank columns, white spaces of each column; Edited column–split into several columns(by,)—remove redundant food and cause species; Cluster & edit column” food1” merge select all similar (block charts:3)

I separate into three datasets, one is cleaned U.S. foodborne disease dataset; another is cleaned 1998 Florida foodborne disease dataset;

On the basis Florida foodborne illness dataset, I selected bacteria species column and food ingredients column to establish network edge dataset by creating PivotTable in excel.

I create Choropleth Map at Tableau by using state name in excel. color degree setting depends on illness records number, add point annotation to describe Florida situation and analysis by percentage(I need to rethink this step’s rational).

The Bubble Chart each circle size determined by records number caused by each ingredient(the bigger of circle, the more illness record caused by this ingredient 1998); Circles color determined by each record’s illness serious range(red means “outbreak” of foodborne diseases). Same, add an annotation for “Burrito Outbreak Issue”.

According to the CDC’s mouth-wateringly named Foodborne Outbreak Online Database (FOOD), there were four burrito-related outbreaks resulting in 83 illnesses reported last year. That’s about 0.5 percent of all cases of foodborne illness recorded by the CDC. While those numbers are relatively low, there have been big burrito outbreaks in the past. In August 1998, a “burrito associated outbreak” in Hillsborough County, Florida, caused 644 elementary school kids in 66 schools to lose their lunches after eating lunch. (Not to be confused with the Hillsborough County Schools Soft Taco Outbreak of October 1998.) The culprits were identified as frozen beef and bean burritos made in Chicago. That incident was one of 16 gastrointestinal outbreaks“ associated with eating burritos” between 1997 and 1998. A 2006 article, titled “Mysterious outbreaks of gastrointestinal illness associated with burritos supplied through school lunch programs,” ruled out “mass psychogenic illness” as the cause of these types of unfortunate events.  (FDH, 1998)

Compare with 1998 and 2015, I create staked graph. I select seven main states of foodborne disease, compare them in a monthly range.

In Gephi, I use ForceAtlas 2, select Dissuade Hubs, Linlog Mode, Prevent Overlap. Node size & color by degree, bacteria species mainly display by grey. edge color by edge label(food ingredient types), the more out-degree, the thick edge itself.

Considered in this Gephi, it’s hard to know ingredients and species relationship. I interact by myself in graph page.

Associated with Consumption of Raw Oysters Issue— Florida 

During November 20-30,1993, four county public health units (CPHUs) of the Florida Department of Health and Rehabilitative Services (HRS) in northwestern Florida conducted preliminary investigations of seven separate outbreaks of foodborne illness following consumption of raw oysters. On December 1, the HRS State Health Office initiated an investigation to characterize the illness, examine risk factors for oyster-associated gastroenteritis, and quantify the dose-response relation.

A case was defined as sudden onset of nausea, vomiting, diarrhea, or abdominal cramps within 72 hours of eating raw oyster Twenty-five additional cases of gastroenteritis associated with eating raw oyster were detected.

Paired serum specimens from 10 patients were tested for antibody to the Norwalk-like virus by enzyme immunoassay (Monroe SS); three pairs demonstrated a fourfold or greater rise in titer. Seven stool specimens were examined by electron microscopy (EM) and reverse transcription-polymerase chain reaction (RT-PCR). In four specimens, small round-structured viruses were detected by EM; in one specimen, a Norwalk-like genome was confirmed by RT-PCR (Moe CL, 1996). This Norwalk-like virus strain had a nucleotide sequence distinct from similar viruses in nearly simultaneous outbreaks associated with consumption of oysters harvested along the Louisiana coast (CDC, 1993).

User Testing

I carry out formative tests using the “thinking aloud” method; at the same time, observation and interview.

Express what they could get from the visualizations; The first impression of it, the confusion part, and general questions. At the same time observe their experience of the interface.

I choose 3 different background people, a student of design management; user experience designer, and graphic designer.


At first, I only annotate the records number:2049 in my Choropleth Map, the first user believes it’s the year number.  She also corrects my description of tables. My Gephi network map firstly without any food ingredients since the font size is too small. She could not understand the content of it.


After the change, I find out my second user, she had visualization class before, and she finds out my dashboard could not interact. I want to change my dashboard by use overall dataset, however, it’s difficult or unable to display the graph of 1998 Florida, since I need specific statue and year’s illness number (would not filter in this way).


The third user told me to lessen the number of colors and when she wants to touch the Notes of the network, it is disappointing that it’s not interactive. She wants to know what ingredients she should notice most. And she was confused about the bubble size meaning of the third graph since I did not put annotation at first.

Feedback Redesign and Future Direction

Interaction features that run counter to expectations:

Since I use 3 different databases in this project. The tableau tables would not interact with each other, when I realize it’s difficult to change, because when I want to use an integrated database to create a new group for only Florida one state, I could not control the number of illness and records in only one region, in this way I would not display the outbreak of 1998 in Florida. Moreover, I combine tableau with Gephi, both software cannot connect. During the user testing, I find out users are not proactively interacted with tableau, but find out Gephi is more interactive, even though the final preview is stable. I need to interact Gephi by myself in graph page and took a screenshot for people to understand, it’s really inconvenient.

Language Use:

From the feedback from three users, I find out my description of the dashboard has problems, it’s not correct and I merely realize it.


I am not satisfying the color of whole visualization graph, for this part, I need to consider more.


Technically, I want to create interactive data visualization systematically, maybe need more programming skills. 3D visualization also attracts me a lot.

I believe sometimes description would interrupt people’s view for the graph, consider the storytelling by image language, I will try to display information visualization without or less text.




Centers of Disease Control and Prevention. Foodborne Outbreak Tracking and Reporting. https://www.cdc.gov/foodsafety/fdoss/data/annual-summaries/index.html

Gordon, Larry J. “Food Protection: The Mission May Be Hazardous to Your Health.” Journal of public health policy 15.4 (1994): 393-396.

Monroe SS, Stine SE, Jiang XI, Estes MK, Glass Rl. Detection of antibody to recombinant walk virus antigen in specimens from outbreaks of gastroenteritis. J Clin Microbiol 131:2866-72.

Moe CL, Gentsch J, Ando T, et al. Application of PCR to detect Norwalk virus in fecal spefrom outbreaks of gastroenteritis. J Clin Microbiol 1994; 32:642-

CDC. Multistate outbreak of viral gastroenteritis related to consumption of oysters—LMaryland, Mississippi, and North Carolina, 1993. MMWR 1