Exploring and visualizing U.S. food expenditure development


Visualization

For my first data visualization project in Tableau, I am working with food data; more precisely food expenditure data from the USDA Economic Research Service “Quarterly Food-at-Home Price Database”. The data sets are divided into four general food categories; “Fats, Beverages, and Prepared Foods”, “Fruits and Vegetables”, “Grains and Dairy”, and “Meats, Nuts, and Eggs”. For this viz, I chose to work only with “Grains and Dairy”.

My approach to the data was exploratory; I am very interested in the food industry and trends in food consumption, but had no concrete expectations for what the data might teach me, so my goal was to get an overview, familiarize myself with the data, and begin to understand the overall picture.

Prior to developing my own vizzes, I drew insights from three data visualizations that all deal with food data in various ways; the first one is the USDA Food Expenditure: Interactive Chart, which shows food expenditure data on interactive line graphs in relation to other major economic indicators, such as GDP, healthcare and housing/utilities spendings. It provides an overview of major trends, for example a steady increase in the share of food spendings away from home from 1953 to 2013, but it does not allow us to learn about the types of food that are being consumed.

The second example is the Global Nutrition Report: Countries with Overlapping Under-five Stunting, Anemia in Women of Reproductive Age, and Adult Overweight. This is an attempt to asses global issues of malnutrition, using a map to show which countries have the most public health nutrition problems, and a bar graph for each country to show, wherein these problems lie. Again, this graphs allows for overview, this time on a global scale, but doesn’t allow for more detailed exploration of food consumption trends.

As a third example, I looked at the World Food Programme: Ebola Stricken Countries Market Prices. This viz puts food prices into a very serious context, and also allows for very detailed breakdowns of the price changes. However, the viz contains so much information that it is difficult to maintain an overview of the overall story of the data.

Based on the three example vizzes, my goal for my own visualization is to provide more detailed insight into the distribution of food expenditures across different food groups, without losing sight of the bigger picture. I hope that this approach will lead me to discoveries that can drive further exploration of food data.

My first step is to normalize the data, which comes as an excel-sheet. I use Numbers to do so. First, I learn that my data on “Grains and Dairy” is divided into 12 sub-categories of food, each of which has its own table in the excel file. To be able to compare food categories, I want to include all this data in one table. I create a “foodgroup” column and paste the data from each table together, now using the foodgroup codes to make the separation that was previously created with the individual tables.

The dataset that I am working with contains both time-based and geographical data, but since the time-based data is noted as years (e.g. 2007) and quarters (1, 2, 3, or 4) and the geographical data is recorded as using region codes (e.g. Division: 1 = New England), Tableau only recognizes this data as numbers or strings. The geographical data is not organized in a way that allows for easy translation into something Tableau can recognize (e.g. states), so I decide to focus mainly on the time-based aspect.

I notice that there are quite a few missing records in the data set. To find out more, I import a csv file of the data to Open Refine. By using the ‘Facet’ function ‘Facet by blank’, I learn that 2,429 out of the total 11,869 rows only have records for the aggregated weight; price, and total expenditure has not been recorded. To understand what these missing records will mean for my visualization and analysis of the data, I examine the rows further; I learn that the missing records seem fairly evenly distributed across years, quarters, and regions, but not across food groups. Food group 17 (“Whole grains flour and mixes”) is missing a large amount of records; only 127 out of 883 rows have recorded price and expenditure data. Group 18 (“Whole grain frozen/ready to cook”) has close to no entries besides “aggregated weight” and Group 27 (“Whole and 2% yogurt & other dairy”) is also missing close to 50% of expenditure data entries. I will keep this in mind when visualizing and interpreting the data in Tableau.

The first visualization I created (shown below – click image for interactive version) is a bar graph showing total expenditure in Grains and Dairy per year (1999-2006), using colors to show the different food groups. Even though the stretch of the x-axis is continuous, I chose a bar graph over a line or an area graph, as the bar graph allows for better comparison year to year. To make the graph easier to read, I’ve further categorized the food groups using color gradients; food groups related to grains are colored green, cheeses are orange/yellow, and other dairy is in the blue range, using more blue-green/turquoise for yoghurts (and for all dairy categories, a darker color signifies a whole milk product, whereas a lighter color is used for low fat). These visual categories are sorted so that the largest one is a the bottom, which further improves readability; Overall, dairy expenditure is greater than expenditure on grains, however, as a single category, non-wholegrain bread, rolls, rice, pasta, and cereal is the largest.

U.S. food expenditure in Grains and Dairy

 

 

 

 

 

 

 

 

There are many ways of categorizing the data, and all of them support different interpretation focuses. Using the grouping function in Tableau, I create new food group categories, reducing the total number to four; Low fat dairy, Whole milk dairy, Whole grains, and Other grains. This makes it easy to read the fact that non-whole grain and whole milk dairy by far represent the majority of food expenditure in Grains and Dairy. See the viz below.

 

Recategorized U.S. food expenditure in Grains and Dairy

 

 

 

 

 

 

 

 

 

Looking at expenditure alone, however, doesn’t tell us anything about how much is purchased of each food group. In order to gain knowledge about this, we need to include the price pr weight and/or the aggregated weight data, both of which have columns in the data set. I thus created an area graph that shows the aggregated weight over time and a line graph showing the average price development for food groups over time (both shown below). In terms of aggregated weight, the all the food groups are very even. The average prices are fairly steady, except for cheese prices, which increase over time, especially from 2003 to 2004. But this is when the effect of the missing data entries becomes important to keep in mind; since the aggregated weight data is based on many more data entries than the price data, it is not accurate to draw any conclusions based on comparison of the two.

 

U.S. food-at-home consumption in Grains and Dairy

 

 

 

 

 

 

 

 

U.S. average prices in Grains and Dairy

 

 

 

 

 

 

 

 

Finally, I looked at the expenditure per quarter. The bar graph (shown below) uncovers a pattern that suggests that spendings are generally highest in the first quarter of each year.

U.S. food expenditure in Grains and Dairy by Quarter

 

 

 

 

 

 

 

 

Moving forward, I would like to add to the number of food categories, looking at more than just Grains and Dairy. However, the data sets are large and incomplete, making meaningful analysis rather complicated. To improve the results of this project, I could imagine spending more time understanding the reasons behind the missing data entries, and spending time finding the best ways to work around this in terms of interpretation and visualization. However, this exploratory approach has still provided me with an overall understanding of the distribution of expenditure across the selected food categories, opening up many possibilities for further analysis and exploration.