Introduction
In the past few decades the internet has brought everyone closer and made the world a smaller place. It has opened several avenues that were earlier unimaginable for all people alike. Access to the internet may have become a basic necessity which is as important if not more than other utilities. This easy access that many of us take for granted may not be the case for many others around the world. This project has a basic goal to visualize this access to the internet and highlight insightful trends. Additionally, this project aims to look into the disparity in internet access amongst people from different income groups around the world. Further, this project draws comparisons between countries in which people have the highest and the lowest access to the internet.
Datasets
The dataset collection used for this project for creating the visualizations is called “Internet Usage”. This is a public dataset collection available on kaggle.com. Although this dataset has the data for broadband penetration and mobile cellular subscription, for the purposes of this project, I have only used the data on number of users and percentage share of users (two separate csv’s).
Inspiration
While looking for similar work, I found the work done by “Our World Data” on internet usage titled “The Internet’s history has just begun”. The dataset chosen by me was also derived and collected using the same dataset used by “Our World Data”.
While there are several interactive and appealing visualizations made by “Our World Data” for this topic, One Visualization titled “ share of the population using the internet” got me intrigued.
I liked the way trend lines were used to show the growth in internet use in countries around the world in Fig#2. I thought that the lack of representation of income categories in this visualization provided me with an appealing way to achieve the goal of this project – to highlight the disparity in the access to the internet and make it apparent. So I decided to use a similar design style for my visualization. Along with it, I also decided to create a similar visualization for “Total number of people using the internet” with my dataset as shown in Fig.#2.
Softwares
The dataset was clean with almost no errors in the data structure, so I didn’t need to clean it using any additional tools. I used Tableau Public, which is a free version of Tableau, for creating the visualizations for this project.
Method, Design Process & Results
I started by importing the csv files of datasets for “total number of users” and “percentage share of users” from my chosen dataset collection. Once both the datasets were imported, I connected them in Tableau to use the joined dataset for creating visualizations. The dataset connection was accomplished easily in Tableau by dragging one dataset besides the other on the “Data Source” tab. Tableau affords its users to easily connect datasets in this manner.
Once the datasets were connected, I started with creating the visualization for the share of the population using the internet. I modeled this visualization on Fig.#2 as shown earlier in this report. The “Our World Data” visualization provided the percentage share of each country on the trend line and highlighted the trend lines for major regions around the world. I decided to use this design to highlight the disparity in the access to the internet across different income groups.
First, I started by plotting the lines for each income group and the “overall world” from 1990 (where the dataset starts) to 2017 (since not all countries had recorded values after that year). Further, I highlighted these lines using solid colors to make them distinctive. Next, I plotted the lines for major regions of the world (each having cumulative data for the countries belonging in the respective region) and gave them a dark shade of gray. This was so that the viewer can easily distinguish these lines and interpret them as belonging to a certain group. Finally, I plotted lines for a few countries, in a lighter shade of gray, as reference points across the spectrum. I then labeled the lines at the end point (2017) with the name and percentage points. I matched these labels with the line colors to be easily interpretable. Fig.#4 below is the resulting visualization.
For the next visualization, which is for the total number of people using the internet, I did a similar visualization as the one from “Our World Data” shown in Fig.#2. In this visualization, unlike “Our World Data”, I arranged the data groups in the ascending order with respect to the number of people in these groups. I kept this area graph very basic with labels showing the necessary information. Fig.#5 below is the resulting visualization.
For the third visualization, I took the trend lines of the income groups and world from the first visualization shown in Fig.#4 and took a closer look at it from the year 2007 – 2017. I plotted the percentage change in individuals using the internet with respect to the previous year. I highlighted the most erratic lines with solid colors (red & orange). For the trend line for “world” data I used a dark gray color to establish a reference point. For all the other income groups which were following a relatively smoother trend I used a lighter gray color. This design was to emphasize the most erratic trend lines as these lines were where there was most variable change. Fig.#6 below is the resulting visualization.
For the next four visualizations, I did basic bar graphs for the top and bottom 10 countries in the number of users and percentage share of population using the internet. I chose cooler colors for the top countries and warmer colors for bottom countries to intuitively signify their position on the world scale. Figs.#7, #8, #9, #10 below are the resulting visualizations.
For the last visualization, I did a choropleth of world map and number of internet users around the world in 2016. Fig.#11 below is the resulting visualization.
After all the visualizations were created, I tried to arrange them in one dashboard which ended up looking very crammed. So I decided to do four different dashboards to showcase all the work. Additionally, I added a dashboard for a 4K resolution screen that has all the visualizations. These dashboards are on my Tableau Project page.
Interpretation
It can be clearly interpreted from the first visualization shown in Fig.#4 that there is a clear disparity in the access to the internet amongst people around the world. The people from income groups below the upper middle income level have a lower individual percentage share for internet usage than the total percentage share for the world population. It can also be seen that the countries from Africa have the least share of population using the internet. This observation is also supported by visualizations 5 and 7 shown in Fig.#8 & Fig.#10. The countries from Asia, Middle east and Latin America occupy the middle spectrum in the percentage share of internet users. The countries in North America and Europe have the most percentage share of population using the internet.
To contrast this interpretation, if we look at Fig.#5, it can be seen that most people using the internet are from East Asia & Pacific, followed by Central Asia & Europe, and South Asia. With the number in South Asia rising rapidly in recent years. It can also be seen that even though the percentage of people using the internet in countries of Asia and South Asia is less, the number of people using the internet in these countries is considerably more. Owing to countries which have large populations in these regions the percentage point remains less. This observation can be supported by the Fig.#7 & Fig.#9 which show the top 10 countries by number of users and percentage of user population.
Further, if we look at the third visualization shown in Fig.#6, it can be seen that the middle income, upper middle income and high income categories have a steady declining growth rate. This can be explained by the fact that these groups are moving towards saturation. The more erratic trend lines of the low income and the lower middle income need to be highlighted, since they have erratic behavior with sharp increase and decrease in the growth rate. It is worth noting that all income groups below the middle income group have stayed above the world growth rate for most if not all the years from 2007 to 2017.
Finally, The choropleth gives the user an overview of the number of internet users around the world at a glance. It clearly shows which countries have the most and the least number of users where they are located geographically and an intuitive understanding of the difference in the users amongst different countries.
Future Direction
If I have to continue working on this project, I would take a look at the other datasets that I didn’t use from the chosen dataset collection which have data about cellular subscriptions and broadband penetration. These datasets will be helpful in visualizing the quality of internet usage along with the quantity of internet usage which is already done in this project. Additionally, classification of different modes of internet connections and use can be added in this project. Along with this dataset collection, the project can also be augmented with other datasets that carry the information about the classification of internet traffic amongst the countries of the world. It may be interesting to visualize how people consume data on the internet and what they like to do on the internet by mapping the internet traffic of countries. These are just a few examples of how this project can be taken forward in the future to create more enriched and appealing visualizations.