Analyzing New York State Test Scores


Visualization

Introduction

Assessment.  This a word that teachers in New York are more than familiar with.

Teachers are constantly being challenged by principals and superintendents to measure and quantify their students performance. Whether or not one agrees with the current amount of testing or believes that it is the best way to assess students, it does offer us a significant amount of data from schools and students around the United States.  It is important that we understand how to analyze this data and understand trends. This is a great way to get the conversation started about the effectiveness of the current educational system.

Tableau Public is a free and useful tool for visualizing data and communicating trends to viewers. Using this software, I was able to analyze data regarding New York State Math Test scores and look into numerous variables causing those scores to rise and fail in certain years. The data was located on NYC Open Data and included the test scores for every district in NYC, grades 3-8, from 2006-2011. The variable of ethnicity was also included and separated into four fields: asian, black, hispanic, and white.

Findings and Analysis

When analyzing this data, one of my first discoveries was a significant drop in the number of students who passed the test in 2010 compared to previous years. This is shown in the heat map below:

The darker the green, the higher percentage of students who passed the test. The steep drop-off in 2010 across all grades is what catches the eye and leads to the following questions:

  • What is the cause of the drop in passing students in 2010?
  • Is this drop the same for all boroughs/districts?
  • Is this drop the same for all ethnicities?

I analyzed this data across the other variables (district and ethnicity), which led to the main visualization:

The lines show the percentage of passing (green) and failing (red) students by ethnicity from 2006 through 2011. Within each ethnicity, there is a rise in the percentage of students who passed from 2006 until 2009, aligning with the previous visualization. The downward trend from 2009 to 2010 may be the result of an increased difficulty of the test or the introduction of the Common Core, which New York State signed onto in 2010.

While the reason(s) for this drop off are unknown, ethnicity appears to be the most important variable provided. While each ethnicity follows the same basic trend described above, blacks and hispanics suffer a much greater increase in failing students in 2010. In fact, while 75 percent of white and asian students still passed that year, the percentage of black and hispanic students that passed fell below 50 percent.

After realizing this disparity, I created a final visualization to show the amount of students tested per year by ethnicity:

As you can see, there are significantly more black and hispanics that took the test each year. This understanding brings the data full circle. Despite the fact that there is not such a large drop off in passing students among the asian and white population, the drop off by grade appears to be very steep. This is because there a steep drop off in the scores among blacks and hispanics and significantly more of them took the test.

Process

I downloaded the data into Excel to get started. Before uploading it into Tableau, there were important steps I needed to take to ensure that it was formatted correctly and that I could use the software effectively. The first (and simplest) step was to change the date format to DD-MM-YYYY.

Next, I noticed that within the spreadsheet there was redundant data that could be eliminated. For example, after each list of grades 3-8, there was list of All Grades, which averaged the scores and summed the number of students who passed and failed. I realized that not only would these rows affect the sums calculated in Tableau, but that if needed, they could be calculated using the software.

Finally, I used Google Refine to transpose the data. The data regarding the number of students who scored in each level was originally cross tab data and I needed to change it into normalized data in order to better analyze it.

Within Tableau, I split the data into dimensions and measurements. Because the data included is over a five-year span, I often put the ‘Date’ in the column section and went from there. I used different sheets to test the different variables, however it was tricky to figure out accurate percentages of those who passed because the scores were diluted by grade, ethnicity and district. In order to alleviate this problem, I created calculated fields to include measurements of the number and percentage of passing students.

For the first visualization, I decided to focus on passing students and felt the heat map was the proper way to show this. The exact numbers weren’t as important as the trend I wanted to show.  For the comparison of the number of students tested by ethnicity, I used a bar graph to easily compare sizes. In order to manage this size and readability of this graph, I used the horizontal bar format.

The main visualization was a bit trickier. I had to format the number of students tested into percentages in order to graph the percentage of students who passed and failed. I moved levels to the dimensions section and grouped the levels together under ‘Passing’ and ‘Failing’. I then dragged this group onto the graph. In order to stay consistent, I used green for passing and chose to use red for failing. Due to the graph being a time-series, I used a line graph and because there was a significant difference in scores between the four ethnicities, I used small multiples.