Protests in the U.S. (2017-2020)


Charts & Graphs, Lab Reports, Visualization
“Demonstrators march in support of the Black Lives Matter movement in the Brooklyn borough of New York City on June 6, 2020, following the death of George Floyd in Minneapolis police custody. (Stephanie Keith/Getty Images)” Image taken from article by Pew Research Center’s article “Amid Protests, Majorities Across Racial and Ethnic Groups Express Support for the Black Lives Matter Movement” published on June 12, 2020 written by Kim P., Juliana M., and Monica A.

Introduction

The year 2020 has not only been a year full of news on the global health crises, but also around global activism. With increasing access to social network sites and communicative technology, activism no longer needs to take place off-line and in person. People can participate across states, countries, with participants ranging from teenagers, young adolescents, to the elderly. Particularly in the U.S., activism is potentially one of the keywords for 2020. With protests and activism on Black Lives Matter, Covid-19 related issues, and the Police still occurring across different states, it is interesting to think about what the actual frequency of protests are, where they are occurring, and why. In what states are protests occurring most often? Is activism actually centered in more “progressive” states as rumored? Why are people protesting-are they all for liberal causes? This project intends to visualize protests in the U.S. documented during the last four years to provide insight into aforementioned questions.*

*Data used for this study may not accurately capture ALL protests in the U.S. — link to dataset provided in the Sources section for more detail

Visualization References

Three different visualizations were referenced in dashboard-ing U.S. protests data.

The first reference was Carnegie Endowment for International Peace’s visualization of protests around the world (Global Protest Tracker, 2020). The visualization highlighted the total number of protests and where there were significant numbers of protests allowing the user to immediately pick out important information about protests around the world. The visualization’s interaction-allowing users to drill down into a country with a tool-tip, was effective and the size of the pop-up box and font was easy to read without hindering the map view. I found the visualization’s coloring the countries and showing bubbles confusing as each did not have its own unique interaction.

The second reference was Tableau’s Covid-19 Data Hub visualization (Tableau, 2020). The visualization’s usage of a line chart to show number of covid cases in each country overtime was effective in showing the trend, rate of change, and variability. The dashboard had multiple line charts for each country selected (10 countries), which allowed users to immediately see which line referred to which country, although it simultaneously generated difficulty in quickly comparing the trends across countries.

The final reference was 91-DIVOC’s state-level covid cases in the U.S., which I chose to understand how to best represent data across 50+ states in the U.S. into a dashboard (91-DIVOC, 2020). The visualization’s wide range of colors, fonts, and number of markers made it difficult for users to grasp the information presented. With over 50 series, the markers proved distracting to the eyes. The chart also had two x axis tick labels with similar information, which appeared redundant without providing much different insight. The visualization also did not maximize data ink-I found with large numbers of series, it is best to keep the gridlines and axis simplistic to allow users to focus on the data at-hand.

Materials

As a subscriber to Data is Plural newsletter-a newsletter by Jeremy Singer-Vine who compiles a spreadsheet of interesting publicly available datasets, I came across the protest dataset from CountLove.org (Count Love protest dataset) that quantified protests in the U.S. from 2017. The dataset contains more than 30,000+ events and includes information around the time of the protest, location, number of participants, and reason for protest.

To initially clean the dataset prior to visualization, I used OpenRefine, an open-source easy-data cleaning tool that can ingest various formats of data including Comma-Separated-Value (csv) files, which the CountLove dataset was in.

Tableau Public was used to visualize the dataset and ultimately create a dashboard. As a free version of the leading data visualization software, Tableau allows for an effective storytelling through easy data pivoting, wide selection of chart types and formats, and an online interface for publishing and sharing dashboards.

Research and Visualization Methodology

I downloaded the dataset from CountLove.org. Initially I opened up the csv. file to check the format of the data and also to make sure that there are no outstanding issues with the dataset. I have found that the dataset has a column, ‘location’ that is not standardized, which called for a need for data cleaning prior to visualizing on Tableau.

In order to quickly clean the data, I used OpenRefine and did the following:

  1. Changed the ‘Date’ column to actual date data type so that it can be read correctly
  2. Split out the states from the ‘Location’ column so that each event has a standardized geographic location
  3. Split out the ‘Event…’ column into three columns: the general reason for protest, the specific reason, and additional details–this was also to ensure that there is a column for each event that has standardized level of information around the protest
  4. Trimmed all extra space from splitting out the columns
  5. Made all the States capitalized (to make sure that no states are double counted due to capitalization errors)
  6. Clustered the general reason for protest column using the cluster feature on OpenRefine, accepted all clusters, to ensure that all events are categorized into buckets using same language and at the highest level
  7. Saved the cleaned data as a new file in order for it to be ingested into Tableau
OpenRefine was used to clean the dataset downloaded from CountLove.org

Once the data was cleaned, I opened the csv. file in Tableau Public. In Tableau, I created different visualizations that best answer the project questions around where, how many, and why protests are occurring in the U.S. I created 9 different types of visualizations (9 sheets)* including:

  1. Point map of total protests across the U.S. **
  2. Whisker-box plot of the distribution of protests by state by year
  3. Stacked column chart of number of protests by causes
  4. Comparison side-by-side bar chart of protest causes
  5. Bubble chart of reasons for protests
  6. Multiple-line chart of monthly protests by state overtime
  7. Side-by-side column chart of protests by U.S. regions
  8. Multiple-line chart of new monthly protests by state over time
  9. Bar chart of % of total protest attendees by cause

* Tool tips were added to all visualizations with minimal detail that highlight numbers and details represented in the chart, as the dataset’s other variables are all represented in the dashboard in separate graphs

**In representing the total number of protests across the U.S., I used a point map instead of a chloropleth map to better emphasize the degree of difference between states and not let the size of the actual states interfere with the size of the # total protests.

In order to standardize the visualization, I applied the same color-scheme across all charts. Refraining from choices that hold political connotations (as heat-maps are widely used for electoral votes) while choosing a color that is least tiring to the human eye, I chose the predefined ‘Teal’ color. Gradients were used in single charts in which the degree of difference was an important factor. Outlines were used for the map to delineate the circles’ sizes in areas where states were clustered.

I decided to create a dashboard telling a story around the frequency and location of protests in the U.S., starting with a general overview of all events and drilling down into yearly distribution, monthly protests, and ultimately the categories of protests and its attendees. To minimize repetition between charts used while still retaining a funnel-like flow, I selected charts 1, 2, 6, 4, and 9 in a new dashboard and distributed the visualizations to somewhat mirror a rectangular funnel with the map of the US the largest and the rest below in equal sizes.

Although I ended up not showing the visualization of region-level analysis (chart #7), I added a region filter on the map so that users can easily zoom and filter into a regional view if desired. I also added a filter box for states so that those who are not familiar with U.S. geography can easily type to filter without hovering over all states to find the one in interest. A time toggle was added so that users can see both the aggregated 4 year total view and drag from left to right to see views of protests over the years. The manual toggle was selected over an animated one as only people with access to certain Tableau suite would be able to see the animation.

A caption was added to chart 9 in order to account for my manipulation of the view by filtering the list of causes for protest–I limited the list of the protests to only show ones that had over 25,000 attendees. 321 Events that fell below the threshold accounted for less than 1% of the total protest attendees and had insignificant variance between another. Annotations (in lieu of MS Excel or Powerpoint’s call-out-boxes) were added to chart 6 to call viewers’ attention to interesting trends in the dataset.

Finally, to allow for the user to understand that all visualizations in the dashboard use the same underlying data, I added an interaction between the charts, using the point-map as a filter for all the charts below. This helps further accentuate the idea that the dashboard is a funnel-like story while allowing the user to drill down into a region or a state they are interested in the most.

After adding a title to the dashboard that captures my project questions, I published the Tableau dashboard to my Tableau Public profile.

Results

The resulting Tableau dashboard can be found via this link. The resulting dashboard is one that answers, to an extent, the initial project questions around the frequency, location, and reason for protests in the U.S. The dashboard begins with a general overview of all events and drilling down into yearly distribution, monthly protests, and ultimately the categories of protests and its attendees. The intention here is to provide audience from and out of the U.S. to easily compare at the state-level and protest-level, the type and size of protests over the last four years.

Reflection

Additional data slicing and visualization could have been created for a deeper analysis of the CountLove dataset. Groupings could have been added by date to create visualizations showing new weekly protests in each state just as the visualization reference on Tableau’s Covid-19 data hub. That could have shown an interesting comparison to the overall number of protests. I also could have categorized the protests’ causes into a ‘For’ and ‘Against’ bucket and created an interesting visualization to see if there are any meaningful differences not just between causes but also within one. Additional features of Tableau may have enhanced the dashboard’s usability: perhaps an addition of a drill-down table would have been helpful highlighting important numbers.

From usability testing, I found that some users found the Monthly Protests by State chart difficult to read without a filter as only some lines have mark labels and as the lines are clustered. I could have used a stacked area chart or have initially shown an average of protests across all states instead.

I expect the visualizations will prove useful for anyone interested in the trends of activism in the United States and in understanding the geographic dispersion of different demonstrations and its popularity. Perhaps this project can inspire others to further delve into the reasons for demonstrations–maybe doing an additional research into whether there has been progress made in protests that has been going on for the last four years. Another interesting study that can stem from this research would be to look into whether there are places within each state in which protests are more common and whether there are outlier cities or suburbs within the outlier states like NY and CA.

Sources

COVID-19 (Coronavirus) Data Resource Hub. (n.d.). Retrieved October 04, 2020, from https://www.tableau.com/covid-19-coronavirus-data-resources

Global Protest Tracker. (n.d.). Retrieved October 04, 2020, from https://carnegieendowment.org/publications/interactive/protest-tracker

Leung, T., & Perkins, N. (n.d.). Count Love Demonstration Statistics. Retrieved October 04, 2020, from https://countlove.org/statistics.html

Parker, K., Horowitz, J. & Anderson M. (2020, August 20). Majorities Across Racial, Ethnic Groups Express Support for the Black Lives Matter Movement. Retrieved October 06, 2020, from https://www.pewsocialtrends.org/2020/06/12/amid-protests-majorities-across-racial-and-ethnic-groups-express-support-for-the-black-lives-matter-movement/

91-DIVOC. (n.d.). Retrieved October 04, 2020, from https://91-divoc.com/pages/covid-visualization/