Introduction
For this lab, I analyzed a dataset from “UNHCR (The United Nations Refugee Agency)” and found it through the TidyTuesday website, to have a larger set of data that is clean, consistent, and compliant with the assignment requirements. This dataset was installed through its own package in R Studio.
A personal passion of mine has always been social sciences: since I was born and raised in Venezuela, which unfortunately, is a country that has a high population fleeing the country seeking refugee and asylum status in other countries in America and Europe. Is important for me to understand (and visualize) how people from different countries were displaced in the past and how it is happening in the present.
I looked at all the visualization options in the R graph gallery, to get a sense of how to visualize the data to understand which graphs to use and decided to utilize basic bar plots which will serve one categorical dimension and one qualitative dimension into one visualization easily.
Materials and Methodology
Approached this assignment with mixed methods – By focusing on the data frame first, by important in R Studio the R package, and the data dictionary provided by UNHRC.
The dataset has a total of 120,338 entries and 16 columns – was able to run a glimpse in R to understand the data and information collected in this dataframe.
Year | Calendar year data was registered |
coo_name | Country Of Origin Name |
coo | Country of origin UNHCRcode |
coo_iso | Country of origin ISO code |
coa_name | Country Of Asylum Name |
coa | Country of asylum UNHCR code |
coa_iso | Country of asylum ISO code |
refugees | The number of refugees |
asylum_seekers | The number of asylum-seekers |
returned_refugees | Refugees who have returned home within the previous year |
idps | The number of internally displaced persons |
retuned_idps | The Number Of Returned Internally Displaced Persons |
stateless | The number of stateless persons |
ooc | The number of others of concern to UNHCR |
oip | The number of other people in need of international protection |
hst | The number of host community members |
Secondly, expanding my knowledge and understanding with articles about refugees around the world including:
Refugee crisis in NYC by The New York Times
Internationational Rescue Committee (NYC)
United Nations Migrants and Refugees News
After reviewing all these materials, I was able to come up with three research questions:
Question #1: What are the countries that have the highest refugee population in the present?
Question #2: What’s the year that records the most displacement of refugees?
Question #3: What are the social implications of having the highest population of refugees within a certain year?
Visualizations
When creating the first visualization, I proceeded to filter the refugee data frame based on the 2022 year (the most recent year from the dataset) and filtered based on the first 20 countries, this will help me to visualize the highest amount of refugees (per country around the world) it is possible to visualize countries as Venezuela, Ukraine, Syria, Sudan, Afghanistan, etc – this helped me to answer Question #1.
I proceeded to do a second visualization with data collected only in 1990, the reason why selected this year is that in the article I read from the NY Times, it was stated that the refugee crisis started spiking starting that year. This would help me to understand my question #2, in which we can see that Afghanistan refugees stayed in the highest value with approximately 650,000 refugees that year.
It is visible that Afghanistan is the country with the highest amount of refugees – To understand more about the Afghanistan refugee socio-political situation, I proceeded to visualize this third bar chart, which helped me to understand and answer my third question.
In this bar chart, we can visualize the highest bar amount of refugees from Afghanistan is set to be around 1988 with more than 600,000 people. This visualization also helps me to answer question #2.
To support my visualization, I also looked for readings that would explain the Afghani Refugee crisis and displacement history to obtain more context on what is been happening in the past 2 years.
I was able to find on UNCHR’s website the following article which states the events leading up to the “Taliban’s takeover of Kabul in August 2021 intensified instability and violence in Afghanistan – causing even more human suffering and displacement” and this support in the data set the highest bars around in 2022 again as a social impact.
Reflection and Improvements
R Studio was more complex to be able to visualize, next time when framing the methodology, I will start by considering the “bigger picture” in order to be able to use more and different types of visualizations. Also, spend more time in the gg_plot code and add more details such as spacing, color palette, labels, and text size.
It was very helpful to use a data set that came from a data science-oriented non-profit, which included an R Studio package for installment, a data dictionary, and visualization guidelines. I will consider using a data frame as a preestablished guideline for future assignments.
Resources
GitHub, Bar Charts ggplot2, (n.d) ggplot2.tidyverse.org/reference/geom_bar.html.
UNHCR, The UN Refugee Agency. Refugee Data Finder – 110 Million Forcibly Displaced People Worldwide, (n.d) www.unhcr.org/refugee-statistics/.
Galal, Hisham, et al. UNHCR Refugee Population Statistics Database, (26 Oct. 2023) https://cran.r-project.org/web/packages/refugees/refugees.pdf
UNHCR, The UN Refugee Agency. About Afganistan, (n.d) www.unhcr.org/refugee-statistics/.