World’s Billionaires


Charts & Graphs, Lab Reports

Introduction

The World’s Billionaires is an annual ranking by documented net worth of the richest billionaires in the world. This list is compiled and published by the American business magazine Forbes, and it was first published in March 1987. The total net worth of each individual on the list is estimated and is cited in United States dollars, based on their documented assets and accounting for debt and other factors.

Forbes released its 35th annual world billionaire list in April 2021, which included 2,755 billionaires with 493 among them being newcomers. In total, they are worth$13.1 trillion, higher than the $8 trillion in last year’s list.

Inspiration

This Forbes World’s Billionaires List dataset attracted me when I browsed through Kaggle. Though the pandemic hit the world economy hard in many places, this did not hurt the market of superrich: Jeff Bezos is again the richest man on Earth, worth $177 billion, while Elon Musk rocketed into the number two spot with $151 billion. I would like to analyze this dataset to see whether there is any interesting result.

Tools & Process

Datasets:Kaggle

My dataset came from Kaggle. It is an online community that allows users to find and publish datasets. The publisher used Python and Web Scraping to extract data from Forbes World’s Billionaires List.

Because the dataset above is a little bit simple and cannot support more complicated analysis,  I also downloaded the GDP per capita in 2020 from the World Bank website. I am interested in the relationship between the number of billionaires and GDP per capita.

Data Cleanup & Visualization:R & RStudio

R is a programming language and free software environment for statistical computing and graphics. RStudio is an Integrated Development Environment (IDE) for R.

In the beginning, I was not able to make the visualization I want. So I asked Prof Adams for help and found that it was because my dataset did not meet the data structure of that plot. Prof Adams helped me clean the data and transform the data structure. For the GDP dataset, I have no idea how to extract the 10 regions I need from the dataset, so I did it manually before loading it into the RStudio.

Original Dataset
Data Clean in the column of NetWorth
Transform Data Structure
GDP per capita Dataset

Learning to use R to draw different plots is challenging. For example, It took me some time to figure out how to arrange a descending order for the bar chart. This feature is just a simple button in Tableau, but in R, I have to search on the internet and try the code provided by others over and over again. Besides, in terms of the visual style, R is not as flexible as Tableau. I only know how to change the main color in the beginning. But if I want to change the size of the point or customize other visual styles, I have to figure out how to write the code to control it, which is much more complicated and time-consuming.

Results

I used 2 datasets to make 4 visualizations including bar chart, treemapping, and scatter plots.

1. The Number of Billionaires by Industry

The dataset lists which industry these billionaires belong to,so I am curious about which industry has the most billionaires. I chose a bar chart because I would like to rank the number of billionaires in each industry. The result shows the top 3 Industries are Finance & Investments, Technology, and Manufacturing. As far as I know, I think this result also reflects the popularity of major choices and the job market in China.

2. The Number of Billionaires by Top 10 Region

This visualization could be a bar chart as well, but I would like to try something different such as treemapping. Since there are too many regions in the dataset, I only kept the top 10 to make it more readable. From the size of each block, it is obvious that the United States and China have the most billionaires and the number is much larger than other regions.

3. The Relationship between the Number of Billionaires and GDP per capita

Those billionaires are from different regions. And Gross Domestic Product (GDP) is a core indicator of economic performance and commonly used as a broad measure of average living standards or economic wellbeing. From last plot, I kept the top 10 regions’ data. So I would like to see whether there is any relationship between the number of billionaires and GDP per capita in these regions. The result shows their relationship is not obvious. But there are three regions that should be mentioned. The United States ranks 1st in the number of billionaires with the highest GDP per capita, which is reasonable. However, China and India which rank 2nd and 3rd in the number of billionaires have very low GDP per capita. I think this result reflects the gap between the rich and poor in these two regions.

4. The Relationship between Age and net worth

From my perspective, most of the billionaires might be old enough to accumulate wealth. So I want to make a plot to validate my assumption. A scatter plot is a good choice because there are nearly 2000 data points. Surprisingly, the result is not the same as I assumed. The age of the billionaires distributes evenly from 18 to 99. I was curious about the youngest billionaire who is only 18 years old. His name is Kevin David Lehman and lives in Germany. He is worth $3.3 billion— after inheriting his father’s shares of the German drugstore company dm-drogerie markt.

Look at the code and plots in RPubs: https://rpubs.com/HaoNi316/825891

Reflection

Compared with Tableau, R is challenging for beginners. When I use R, I have to figure out what is the data structure required by this plot, what is the code, and how to make it look better. And when I searched the problems I met, I did not understand the meaning of the code provided by others. I have to copy the code first and try to change the value to see what happens. However, most of the time, the code I found did not work. Apart from those four plots I made, I also tried maps, but I failed in the end because of the wrong data structure. I also got some good suggestions from my reviewer. For example, she suggested that I could change the order of blocks in treemapping and change its default color to the gradient. But I did not find the solution of changing the order, and the gradient color in R only allows at most 9 variables, while there are 10 blocks in my treemapping. I guess if I used Tableau, I would not need to transform this dataset and it would help me make nice plots quickly.

Overall, R and RStudio are powerful tools for statistical computing and graphics. But they are not friendly to beginners. Although R is free for all, there is nothing more expensive than something free. Users have to spend much time learning its basic function and the usage of different packages.

References

https://en.wikipedia.org/wiki/The_World%27s_Billionaires

https://www.forbes.com/billionaires/

https://www.dw.com/en/forbes-a-new-billionaire-every-17-hours/a-57135443