Relationships between credit card limit and other variables


Visualization

Background

People regularly use credit cards to make quick purchases and build credit. They’ve become essential in developed nations, where cash is being phased out for the safety of the credit card and the convenience of carrying all finances under one card that also offers rewards and fraud protection. One important aspect of credit cards is the credit limit, which is determined by a variety of factors such as one’s credit history, income, and creditworthiness.

I used the Credit Card Customers dataset from Kaggle, which contains data from over 10,000 customers from banks in the United States. I want to determine which major factors such as age, education, or even income have the greatest correlation with credit limit.

Methods

I used RStudio and the ggplot2 package to create my visualizations. I wanted to create three visualizations consisting of the following: credit limit vs. education level, credit limit vs. income, credit limit vs. age, where credit limit is always the dependent variable on the y-axis. Simple, separate graphs would be more effective in distinguishing the variable with the highest correlation to credit limit.

As the dataset sufficiently contained over 22,000 entries, I started by tokenizing the csv file into separate words, allowing to filter out null and unknown values, such as “Unknown” for education level. Afterwards, I used groupings to combine repeat variable values. For instance, multiple users with the age “60” would thus be grouped together as one entry. In addition to grouping, the credit limit was therefore taken as an average credit limit.

After filtering, grouping, and summation functions for the credit scores, I created a series of 3 plots (bar plots and scatter plot) using the ggplot2 package, which were gradually edited to improve aesthetics and formatting.

Visualizations and Interpretations

The first bar plot visualization below displays average credit limits (in USD) on the y-axis depending on income on the x-axis, which are grouped into intervals of 20K USD. The plot clearly shows a positive correlation between income and credit limit as the bars consistently follow an increasing trend. A higher income allows for a greater trustworthiness and limit to spend.

The second scatter plot visualization provides insights on the correlation between age (x-axis) and average credit limit (y-axis), where the youngest age was 22 from the dataset and oldest age was 77. We can observe a strong quadratic correlation between these two variables, as average credit limit increases up until an age of 50 before it starts decreasing again. This is expected, as younger adults start building their incomes, and people of older age spend less as they retire.

The third bar plot visualization provides insights on education levels and their effects on average credit limit. Surprisingly, there seems to be little discrepancies in the credit limits for each education level. We could assume that the lowest average credit limit is for Doctorate due to a lack of income from staying in school. Surprisingly, a lack of education does not necessarily correlate to a lower credit limit, as Uneducated appears to show the same average credit limit as Post-Graduate.

Conclusions

I’m new to R when it comes to visualizations, so the most challenging part about this assignment was using the coding guides to properly format and edit my plots whenever they appeared off in the preview (i.e. tight spacing between axis labels). It took a bit of internet searching to determine the right functions to use for appearance modifications. Another challenge was to install the packages correctly, as R on Mac is trickier to use and run without throwing errors.

I would have preferred to use a more varied dataset, as this one from Kaggle lacked any identifying information about the names of banks, as well as the physical location of users. As the dataset is free and properly separated into columns, however, it is usable enough.

References

https://www.kaggle.com/datasets/sakshigoyal7/credit-card-customers

https://www.pexels.com/photo/bank-blur-business-buy-259200/