In recent years, the digital realm has been evolving rapidly. With over 2 billion monthly active users and over 1 billion hours of video watched daily, YouTube is one of the most popular social media platforms on the internet. In YouTube, subscription is one way to measure a channel’s success. The most popular YouTube channels have millions or billions of subscribers and views, making them incredibly influential content creators on the internet.

Data Preparation

Analyzing the most popular YouTube channels can offer valuable insights into the online audience’s trends and preferences. To get an overview, the csv dataset I selected from Kaggle named ‘Most Subscribed YouTube Channels’ contains 7 columns: rank, youtuber, subscribers, video views, video count, category, and started, which includes both quantitative dimensions and categorical dimensions and has a limitation in started years from 1970 to 2021 (Kaggle, 2023).


OpenRefine: Before visualizing, I first used OpenRefine to clean the dataset. Compared with tools such as R and Excel, OpenRefine simplifies the process of cleaning and standardizing data through functions like filtering, clustering, and text manipulation, resulting in efficient data preparation for analysis. For this dataset, I discovered several ‘0 values’ in video views and counts variable, as well as several null values in the category variable. These records were subsequently removed from the dataset. After the removal, the resulting dataset of ‘Most Subscribed YouTube Channels’ comprised 971 rows.

Tableau Public: In this project, I used Tableau Public to create charts to visualize the data. Tableau Public is an interactive data visualization tool. With a simple drag-and-drop function on most interfaces, it provides a user-friendly and efficient way to create engaging and informative graphs. By using charts, diagrams, and other visual representations of data, we can identify patterns, make connections, observe the outliers, and gain new perspectives from the raw data.

Analyzing the Relationship between YouTube Subscribers and Views by category

According to the charts below, which display the top 5 categories on YouTube of views, features, and subscriptions, valuable insights into the preferences and behaviors of YouTube users are provided. By examining the graphs, we can see that music and entertainment are the two most dominating industries among all the channels.

Linear regression analysis provides a further understanding of the relationship between YouTube subscribers and views. The left graph shows a convincing moderate effect size with a 0.68 R-square value, indicating a positive relationship between the number of subscribers and video views. This means that as the number of subscribers increases, the total video views will increase as well. The right chart examines relationships between different categories. Each category has its own prediction equation that models the relationship between the number of subscribers and video views. Through visualizing, it is clear to see the comparison between each category. Among all the industries, the category Shows has the highest slope coefficient and a high R-square score, indicating a stronger linear relationship between the number of subscribers and video views.

Analyzing the Relationship between YouTube Subscribers and Views: A Category-Based Linear Regression Analysis

What is more, to gain a detailed insight into video views, I compared views by category and start year using a pie chart and a scatter plot, with an outlier value in News and Politics in 1970 removed for consistency. These two graphs illustrate the relationship between views and start years, with channels created in 2006 being the most popular and the music category in 2009 having the most views.

Total Video Views by Category and Start Year


The analysis of the most subscribed YouTube channels has revealed some interesting insights, and data visualization helps explain the topic in an easy-to-understand format. According to the data analysis, we have seen that the top channels are dominated by fields in the music and entertainment industry. Also, different industries present different popularity. Some channels with fewer subscribers sometimes have more views and more engagement with their audience. Further research can explore how specific factors, such as average view duration and audience retention, impact video views within individual category. In conclusion, this analysis presents a prospect for further research and analysis in this dynamic and rapidly changing industry, helping self-media practitioners optimize their strategies and increase the reach.


