Introduction
Sharks play an important role in marine ecosystems. As apex predators, sharks are the ones to maintain the species below them in the food chain and keeping the ocean balanced. Shark attacks are rare but highly publicized, especially in media; for example Jaws movies. Even though shark attacks are rare, it is still useful to know the factors that play roles in increasing the risk of shark attack, such as types of sharks, location, and activities.
I decided to create visualizations of to show countries with the most shark attacks, the activities with the highest risk, and the types of sharks that had attacked the most.
Research Questions
1. What countries had the highest number of shark attacks?
2. Do numbers of shark attacks decrease throughout the years or increase?
2. What is the most common activity driving shark attacks?
3. What types of shark has been recorded to attack humans the most?
4. Do all sharks attack?
Dataset and Tools
I got the Shark Attacks dataset from Kaggle. The dataset contains a 100 years of shark attack records from 1845 – 2017. I cleaned the data set by eliminating the empty rows, columns, and null values in the dataset by using OpenRefine and decided to focus on the years of 1990 to 2017 since I want showcase the recent years as I think it might reflect. I then use RStudio to create visualizations that will showcase the information needed to answer my research questions. I used 2 packages — ggplot2 and dplyr to create visualizations on RStudio. ggplot2 was used for the design of the visualizations, which is a syntax that allows creators to quickly make complex visualizations with code. dplyr is another R package that makes data manipulation and transformation tasks effortlessly possible.
After looking through the dataset, I came up with 4 visualizations to support my findings.
Visualizations and Findings
After uploading the dataset to RStudio, the first visualization I created is a bar chart to visualize the top 10 countries that shark attacks happened the most from 1990 to 2017.
This bar chart will give the audience a huge piece of information right away that USA has the highest rate of shark attacks followed by Australia. This is a useful visualization as it compares countries with each other.
The second visualization I created is a line graph of number of shark attacks in each year from 1990 to 2017.
This type of graph allows audience to see the trend of shark attacks throughout the years. Based on the graph, the shark attack slowly increases through the years and then suddenly drops from 2016 to 2017.
The third visualization I created is a pie chart showing the activities that drive shark attack. The most obvious information that can be taken from looking at this pie chart is that the activity that drives the most shark attack is surfing and followed by swimming.
The fourth visualization is a bar chart of shark attacks by shark types from 1990 – 2017. The obvious information that could be taken by looking at the visualization in a few seconds is that the white shark has the highest number of attacks on humans, followed by cases where shark involvement was not confirmed. The third place belongs to tiger sharks and so on. The type shark that has the least number of attacks is a backflip shark.
Peer Critique and Reflection
I was thinking about creating visualizations of something entirely different at first and had a discussion with my peer. My peer did give me a suggestion that a bar chart would be an interesting way of showcasing the amount of different types/groups of something. For that reason, I decided to create a pie chart to show the percentage of different types of activities that was linked to shark attacks the most, which is useful as it is obvious for the audience to tell right away what is the majority within all the activities, which is surfing.
What I found the most challenging is during the data cleaning and preparation. This includes having to remove missing values, cells, rows, columns, and converting data types. Also, it was my first time using RStudio so I had to run the code multiple times before the plot started making sense or showing up at all. There were times I find it difficult to fix one tiny problem about the code and visualization but could not quite figure out how. I had to try and retry multiple times, work around so many issues and finally being able to create 4 different useful visualizations on shark attacks.
Conclusion
I really enjoyed analyzing the dataset along with creating useful visualizations to answer my research questions. Based on how sharks are portrayed as dangerous and aggressive, not all sharks attack humans and not every beach has sharks. Even if sharks do attack humans, it happens rarely. Based on the dataset, I also found out that whale sharks has never been recorded to attack humans, which means that not all sharks attack. I am interested in learning more what caused the shark attack rate to increase overtime, their behavior and why they chose to attack. Sharks are my favorite animal and I feel like the are misunderstood and are all portrayed as man-eaters. As a result, I will further look into what are the truths and myths about sharks and to understand their behavior better. This is the start of my research journey to learn more about sharks.
Citations
https://www.kaggle.com/datasets/mysarahmadbhat/shark-attacks?select=attacks.csv
https://www.kaggle.com/code/akhabash/cheatsheet-70-ggplot-charts
https://www.kaggle.com/code/rtatman/visualizing-data-with-ggplot2
https://www.kaggle.com/code/jessemostipak/dive-into-dplyr-tutorial-1