A Data Comparison Between Two F1 Legends

Overview

In the Formula 1 (F1) community, there is a recurring debate about who is the best driver of all time. Until recent years, most proposed that Michael Schumacher should take such a title as he has won the most driver’s titles (7). This was generally an agreed upon conception until Lewis Hamilton won his 7th World Driver’s Championship in the 2020 season. Following his win, the debate started heating up, as Hamilton’s fans now declared him the best driver of all time. However, Formula 1 is more nuanced than that, and even if statistics don’t tell the whole story, I set out to compare the statistical data of Schumacher and Hamilton.

Process

The Data Problem

Before I walk you through my data and how I used it to create the visualizations shown below, I must clarify to those who may not be familiar with F1 that it is a complicated and ever-changing sport. Every team that participates builds its own car, which changes every single season. Some even build their own engines (called constructors). No one season is the same as the previous or next, as the regulations constantly change. This fluidity makes it difficult to answer my question purely with data. However, it can still help us get a better understanding of the achievement of each driver and compare the results.

Learning curve

As a novice user of Tableau or any data visualization tool, I approached this project with ambition but little understanding of the scope of my research. The data set I used for this project is publicly available on the following link: https://www.kaggle.com/. The set contains 14 tables of data, ranging from circuit names to driver standings and everything in between. It is a very comprehensive data set that spans the whole life of F1 (1950-2025). These extensive data tables complicated my initial approach. It took several tries to understand how to connect the tables using the right relationship, and even more tries to understand that some of the data was not directly compatible due to the structure of the tables. After a few failed attempts to create a multiple line chart, I decided to focus on creating visualizations one driver at a time.

Data Visualizations

The First two visualizations were built to gain familiarity with the tool and the data set. Fig.1 is a line chart showing Lewis Hamilton’s final championship position from 2007 to 2025. I then applied the same chart to show Schumacher’s positions during his F1 career (1990-2006 and 2010-2013)

These line charts were achieved following several iterations and further research. While iterating, I understood that I had a problem with the driver_standing table and it outputted exaggerated values compared to the ones expected. Only after going back to the original table I understood that the table shows the driver standing for each race and not just for the final standing of the season. Due to this excess of data, my chart was summing the standing positions, giving me the incorrect values. Following some research on YouTube, I learned about calculated fields, however, I was unable to figure out what formula/s I needed to make my visualization work. Here is where I decided to lean on Claude.ai. to help me figure out what formula I needed, I typed the following prompt:
Hey claude, I’m working on a data visualization project in Tableau with the data sets that I attached. The visualization should show the final standing position for the driver Lewis Hamilton in all the seasons he competed in. However when I added the Year to column, position to row and created a filter with driver id:1. which is Lewis Hamilton’s driver id. the line chart shows values that are way off what I know to be the correct answer. I think there is a problem with the driver_standings.csv sheet, it seems to be logging way too many values for each seaoson, how do i get Tableau to use just the final standing position for the viz? would calculated fields help?

The LLM guided me to understand Tableau could use only the final standing position for each driver by creating a calculated field with the following formula [Round] = { FIXED [Year] : MAX([Round]) } This compares each round’s number (standing position) against the maximum round number for the year, returning True only the standing position of the last race of the season. Adding this field to the filters and marking it as True, prompted Tableau to display only the final championship position for each year.

This approach was out of scope for the assignment, however, I would not have been able to reach the result needed without it.

Due to the extensive time I dedicated to trying to make just the first visualization, I changed the goal of my project to focus on simpler charts. The following chart deviates from the original goal, however, it helped me understand how to create a bar chart and customize its colors. The chart shows Lewis Hamilton’s finishing position in the Monza GP throughout his career. Moza was picked as it is the longest standing track in the F1 calendar, and it is renowned to be the “temple of speed”, a circuit where drivers show off their ability to perfectly launch off tight corners into long straights, flexing their engine’s capabilities.

In the visualization above, we can see how Lewis Hamilton won the Monza GP 5 times, finished second twice, and third once, for a total of 8 podiums. This ranks him at an equal position as Schumacher, as we can see on the chart below.

Michael Schumacher also won the Monza GP 5 times and has the same number of finishes on the podium as Hamilton. They are both considered masters of this circuit, and the data backs it up.

I followed these visualizations with a pie chart showcasing the number of finishes in the 1st, 2nd, and 3rd position for each driver throughout their careers.

These pie charts reconnect us to the original scope of the project. Many supporters of Hamilton point out to the fact that Lewis has more wins and more podium finishing positions than Schumacher, making him the better driver. However, we must remember that the two drivers competed in different eras of F1. Schumacher raced throughout the 90’s and 00’s, where cars were considered extremely unsafe and unreliable prototypes. Furthermore, seasons were composed of fewer races, making it harder to stack up numbers.

Who is the best driver?

Given all the different factors that need to be taken into consideration, it remains a genuinely subjective question. It all depends on what you value more, data or context, and we’ll keep having this debate for the rest of the sport’s history.

Reflections

Overall, I am pleased with the progress made with Tableau. If I knew then, what I know now about the data set I picked, I would have probably diverted to a different one to allow myself to spend more time customizing the visualizations and enhancing their look. However, going forward I have a better understanding of how to read and interpret data sets so that I can foresee potential issues with the creation of visualizations. Given that this is the first time that I have handled a data set and the Tableau software, I can consider this project a partial success in understanding and learning a new tool.

I hope my struggles can help future students be better prepared to take on this project, and if anyone got to this point, thank you for reading. I wish you good luck in your Tableau project!

Information Visualization

Student work at the School of Information, Pratt Institute

A Data Comparison Between Two F1 Legends

Leave a Reply Cancel reply