A nutritional network of good protein-rich combinations

Context
The network chart produced is part of a data visualization project exploring the environmental impact of protein-rich foods. Its aim is to support a more holistic decision regarding the food we eat by correlating data on nutritional value with its effects on the environment, including greenhouse gas emissions, wasted water, and land use in its production. For this specific visualization, a curated list of foods that have been studied was used to generate classic and nutritionally meaningful pairings, focusing on creating well-balanced, high-protein meals.
Data
The sources for the project on food nutritional data come from the US Department of Agriculture, available as an API. After cleaning, managing, and merging with the environmental data, the resulting dataset includes 50 foods, primarily raw ingredients, categorized as animal, dairy, and plant-based sources of protein. For the network chart, a dataset was created based on the previous one. The list of foods, along with their categories, was provided to an artificial intelligence model, ChatGPT 4o, to create pairs of good gastronomic combinations among themselves.
The prompt, along with the list, was to create three levels of combinations: 1. Strong, considering classical, widely accepted culinary matches, such as egg and bacon; 2. Moderate, including a good level of flavor synergy or common usage; and 3. Acceptable pairings, or complementary nutrition, although less common in recipes. The model sources mentioned were practical cooking guides, like The Flavor Bible, and flavor science to reinforce real cooking practice by compiling numerous recipes from the internet.
The chart


The network connects pairs of nodes that represent foods in their categories’ colors, sized by the amount of protein in a 100g serving, through edges colored by the mix of those colors, with thickness proportional to the level of connection — strong, moderate, or acceptable. Since the number of nodes and edges is relatively small for a network, it was possible to plan the positions of the nodes.
The decision was to group them by category, ordered by the amount of protein, in a circular relation among the three groups. In that sense, it made the path of each edge clearer, while also avoiding crossing over the nodes. For that purpose, it was necessary to use a plugin to lay out a map using coordinates x and y, calculated for each node in the appropriate spreadsheet. The projection adopted was the Equirectangular, ensuring that the positions weren’t distorted.
Variations



Another possibility for a network representing the same data was to list the three categories simply. In this case, it was tested to note the sum of protein from the two nodes connected at the center of each edge. However, the legend for the sum could lead to misinterpretation in the visualization of the thickness variation.
The color of the edges was also a deciding factor, involving the choice between creating a mixed color by merging categories, a gradient of gray to express the level of combination, or a solid gray color to highlight the nodes. Since the levels were represented by another visual, the colorful option was chosen, revealing new information in a harmonious palette.
Another direction would be to adopt the Fruchterman-Reingold layout, which could visually relate to the radar charts previously produced for this same project. In that variation, the focus would be on the foods with more pairs of combinations. Butter, along with bacon, would be at the center of the circumference. However, as the research seeks to visually demonstrate several relationships between protein-rich foods, the coordinates were more appropriate. Butter, for instance, is among the foods with the least protein, containing only 1g of it per 100g serving.
Graphic project
As the network chart was considered part of a project with previous charts already produced, a challenge faced was integrating the graphic elements into a visual identity among the graphs. The solution was to export the resulting network from Gephi to Adobe Illustrator software to properly polish the final visualization. In this second step of the project, the legends were better localized, the colors were homogenized, and the edges were refined. The outcome was to design an organized network chart while keeping the nodes and edges distinguishable.
Next Steps
Interactiveness
A potential next step to this chart would be to explore the interactivity offered by Gephi and to develop a tooltip box for the mouse-over action, which could include information about the category, sub-category, total, and partial amount of protein, among other details. Another idea would be to create a protein calculator. In that case, for each node selected, a box could count the total amount of the nutrient, supporting a balanced decision when composing a meal.
Pictures
Food, as well as the theme, is part of people’s daily lives. Utilizing actual images of the foods in this context could be beneficial in bringing the user closer to the represented content. A consideration for this representation would be the selection of photographs that accurately depict the amount of food expressed in each visualization.
References
Dornenburg, Andrew, and Karen Page. The Flavor Bible: The Essential Guide to Culinary Creativity, Based on the Wisdom of America’s Most Imaginative Chefs. New York: Little, Brown and Company, 2008.
Ritchie, Hannah. “Deforestation and Forest Loss.” Our World in Data. 2021. https://ourworldindata.org/deforestation.
U.S. Department of Agriculture, Agricultural Research Service. FoodData Central. 2025. https://fdc.nal.usda.gov/download-datasets.