Visualizing Cohort Default Rates


Visualization

Background & Data

For my first lab using Tableau Public, I was interested in looking at student loan default data to see whether significant trends could be found surrounding the needs of students paying higher-education tuition. I downloaded an appropriate dataset from the U.S. Department of Education (DoED) office known as Federal Student Aid (FAFSA), which annually publishes student default statistics using a measurement called a cohort default rate (CDR).

The CDR is a percentage of a school’s borrowers who enter repayment during a particular fiscal year (FY) and default prior to the end of the second following fiscal year. Every eligible school’s CDR and all accumulated school data is downloadable as an Excel document tracking three fiscal years worth of rates. While it would only represent a small span of time, specifically FYs 2010, 2011, and 2012, I concluded that the dataset contained a strong mix of qualitative and nominal values with which to work.

NoteAs of September 28, 2016, the CDR data that is available to download has changed to FYs 2011, 2012, and 2013. I was unaware that my timespan would cease to be accessible, and I have since archived the original dataset that I used for future reference.

Research Questions

Each row in the dataset as downloaded contains a school, supplementary information such as place (including DoED region codes, states, and address), program length, school type, and DoED “ethnic code”, and defaults/borrowers/CDR for all three years. Faced with remarkably comprehensive statistics, I decided I would present the issue in a series of progressively localized microcosms which might reveal correlations between a national default rate and a regional rate, or a state/city rate. Ultimately, I settled on three key questions with a less-related fourth if time allowed:

  1. What was the national trend of cohort default rates from 2010 to 2012?
  2. How do regional cohort default rates compare to each other and the national trend from 2010 to 2012?
  3. Are these national/regional trends reflected on a state or city-level, and how do these results reframe (or not) the cultural conversation over higher-education loans?
  4. How does cohort default rate vary across types/lengths of higher education programs?

Sample Visualizations as Inspiration

Before even accessing Tableau, I researched other visualizations tracking rates over time for inspiration. Though very simple, this visualization of US Student Loan Debt over ten years from the Storytelling With Data blog (Figure 1) influenced me to try using bar charts to illustrate these discrepancies.

Figure 1

Figure 1

Thinking about consistency of displays across my questions, I sought out examples of bar charts that could cleanly display a rate/percentage over time across many subcategories. In preparation for my second question, I found this bar chart from DevInfo of enrollment percentages over two years within African schools (Figure 2).

Figure 2

Figure 2

Finally, I turned to a previous Pratt student’s Tableau Public research to envision how I might evaluate CDR trends on a localized level in a way that would still be correlative to questions 1 and 2. To analyze drop out rates in NYC public schools, this student determined the highest dropout and graduation percentages by school at either end of their date range (2001 and 2006) and created bar charts for those schools so as to visualize how schools beginning or ending with a particular outcome might have changed throughout time (Figure 3).

Figure 3

Figure 3

Methods & Results

To import my dataset into Tableau Public, I needed to scale and normalize the crosstab output that the DoED sheet had output. In Excel, I removed columns for categories not directly addressed by my research questions.

Next, I used openRefine to separate out each school row into three rows per school, each of which now corresponded to data for one of the three fiscal years.

After importing my data, I discovered that Tableau Public could perform calculations which would be necessary for charting defaults out of loans by year instead of the dataset’s “CDR” category which only offered a yearly percentage for a particular school. I created a new measure titled “Cohort Default Rate %” with the calculation: [Number of Defaults]/[Number of Loans]*100. This new measure would ultimately create the rows of every one of my visualizations.

Still interested in the national average CDR reported by the DoED, I graphed the average by year. Between each year, the national average dropped by over 1%, which I changed from a line graph to a bar chart per my sample visualization (Figure 4).

Figure 4

Figure 4

From here, I looked at regional averages. The region column on the original dataset contains a number 1 through 10 denoting DoED US region codes. In order to graph region dimensionally, I manually created ten groups out of the states (e.g. “Region 1”, “Region 2”…) which became a new dimension. To answer question 2, I made bar charts of the average CDR by region. Each region exhibited a similar downward trend, and I chose to create yet another bar chart for the two regions which seemed to exhibit the most and least extreme drop from 2010 to 2012. My original conception for question 3 was stymied, however, by the discovery that Tableau Public would not be able to separate out nested terms (in this case, the school names within city within state) by “top 5s” as the viz about drop-out rates had done.

At this halfway point in the lab, I became concerned about the misrepresentation of the actual CDR data stemming from my decision to use averages. Dissatisfied with my approach, I set out to reanalyze questions 1, 2, and now question 4 using the actual data dimensions.

With the actual data, I began by visualizing each school’s CDR by year with circles. Surprisingly, there was still a visible downward trend from 2010 to 2012. In order to properly represent the median CDR, however, I built a box-and-whisker plot by changing the measure from discrete to continuous and using the Show Me function (Figure 5). The national annual decrease in loan default was finally displayed accurately.

Figure 5

Figure 5

Once again considering question 2, I began by applying the box-and-whisker method from figure 5 to all ten regions (Figure 6).

Figure 6

Figure 6

I expanded upon the regional visualization by adding year alongside regional code in the column bar. For clarity, I incorporated color for the year and activated trend lines within the Analysis menu (Figure 7).

Figure 7

Figure 7

My final methods concern my work on question 4, which I addressed towards the end of the lab after failing to answer question 3. Similar to the regional codes, my original contained codes for “program length” which corresponded to terms such as “Associate’s Degree”. Returning to openRefine, I used the text facet feature to determine the three most represented program length codes: “Associate’s Degree”, “Bachelor’s Degree”, and “Master’s/Doctorate Degree”. From these results, I created another box-and-whisker plot (Figure 8).

Figure 8

Figure 8

Noteall above sheets are available at https://public.tableau.com/profile/alexander.whelan#!/

Discussion

Looking at the box-and-whisker plot in Figure 5, it is possible to extrapolate an overall trend towards fewer occasions of default from 2010 – 2012. On the one hand, the national conversation surrounding student loans typically focuses on gross analysis of expenses (such as in Figure 1) that somewhat precludes the data on how students are prepared for repayment. At the same time, FAFSA arguably publishes the CDR data to inform or even warn borrowers about schools on an individual basis. It is imperative to point out, then, that simply looking at Figure 5 and seeing the downward national trend is liable to misrepresent the thousands of schools whose CDRs rise far above the quartiles.

Figure 6 suggests that the intensity of CDR does fluctuate by region, and Figure 7 even reveals that the national trend towards a lower CDR is seemingly consistent within each region (though the slope of the trend lines vary drastically). This particular graph conveys that certain parts of the country might be more liable to higher rates. In this study, Region 4 (incl. NY and NJ) and Region 6 (incl. TX, LA) seem especially beset by defaults. It is tempting to then analyze particular states to explain these results, inviting studies of how expensive particular states’ schools might be and relevant financial data. Of course while graphing CDR by percentage allows for some reasonable comparison, it is important to consider that the CDR calculation (defaults out of total number of borrowers) means that disparate numbers of borrows could result in a skewed visualization.

Though I wish I had afforded more time to question 4, I think that Figure 8 emerges as the most promising viz from my lab. The same questions about percentage populations should apply to this graph, of course. Still, the disproportionately high CDR among Associate’s programs is striking enough to invite research into the education and infrastructure surrounding repayment in these shorter, less-qualified degree programs.

Future Directions

Considering that the DoED is now providing data for FY 2013, it could be illuminating to repeat these methods using the most recent three-year cohort default rate data. As they did for FY 2012, the DoED announced the national average CDR for FY 2013 to be 11.3%, which means that the average did not change from 2012-2013. Consequently, more granular analysis of regional trends and the states within particular regions could be considered in conversation with the consistently downward 2010-2012 trends.

For more work with this 2010-2012 dataset, I would be interested in more graphs in the vein of “CDR by Program Length” that could look at variables from the original dataset like “type of school” and “DoED ethnic code”.

References

DevInfo. (2013, February 21). Africa: Solving  the “data blank” problem in education. Retrieved from http://www.devinfo.org/articles/Africa-Solving-the-quotdata-blankquot-problem-in-education-13

Federal Student Aid. (2015). Official cohort default rates for school. Retrieved from http://www2.ed.gov/offices/OSFAP/defaultmanagement/cdr.html

Knaflic, C. N. (2016). Storyteling with data [web blog]. Retrieved from http://www.storytellingwithdata.com/blog/2011/10/visualizing-student-loan-debt

Nolan, C. (2015, October 5). Analyzing graduation and drop out rates in NYC public schools (2001-2006 cohorts). Research.prattsils.org. Retrieved from http://tinyurl.com/hv8xssb