Statistics, more often than not, play a major role in generating useful insights from data. Regardless of the level of complexity, this mathematical discipline will come in handy. In the case of working with temporal data, i.e. data which is collected over certain periods of time, we can combine statistics with various temporal visualizations in order to generate interesting stories.
This lab report presents four statistical and temporal visualizations aimed to provide insight into some economic events which occurred during the two opening quarters of 2019.
My first visualization was inspired by the several treemaps that I have seen on the Reddit subforum r/dataisbeautiful, which is a social space for users to share data visualizations. I believe that the treemap is an efficient way of displaying size and share proportions, given that the differences are large enough to visually distinguish.
Stephen Few’s, Now You See It (2009), is the main source of inspiration for my other visualizations, such as my bar chart for displaying prediction accuracy. As pointed out by Few (Now You See It, p. 38), bar charts are often superior when it comes to presenting comparisons. Few (2009) also inspired my time-series graph and the choice of using groups to clearly detect relevant patterns in the data.
My dataset was retrieved from Keggle.com, where it had been uploaded by a user who had extracted information from the financial website Investing.com. The data consists of economic events that took place during Q1 and Q2 of 2019. Each row represents an event, with a brief description, along with information on how the events were expected to impact peculiar variables, such as price indexes or currencies, as well as actual and previous values of those values.
Tableau Public was the main software tool used for this project. Additionally, I used Microsoft Excel and Google Refine for minor dataset manipulation tasks.
Prior to importing my dataset to Public Tableau, I used Microsoft Excel and Google Refine to improve the quality of the data. Excel was utilized to add column headers, which were missing from the original dataset. I also changed the format of relevant columns, such as transforming one variable from “General” to “Time”. Google Refine was used for minor data cleaning, such as removing faulty values, as well as transforming additional values.
The work process in Tableau Public started out with testing several visualizations in order to determine what would be the most interesting to present. I decided to keep the visualizations simple rather simple, drawing on inspiration from Edward Tufte’s reasoning about overplotting.
RESULTS AND INTERPRETATION
Click here to view visualizations in Tableau
Share of events per country
This treemap presents the share of total events per country, grouped by continents. The share of events is dominated by the United States (23%), not too much of a surprise considering it is the largest economy in the world. However, what is rather more surprising is that the second-largest economy, China, is found in 6th place with 5% of the total events. Despite this seemingly surprising result, it is important to note that the underlying data is not a representation of all the economic events of the time period. Also, since the data was cleaned prior to the analysis, we run the risk of having disregarded some events.
This faceted bar chart displays the accuracy of analyst predictions based on what volatility what was expected for each event. I color mapped the bars so that each color would reflect the “feeling” of that bar, red being negative, and green positive. For all levels of expected volatility, the actual result turned out worse than the prediction, implying on impact underestimation.
If we wanted to further investigate the accuracy of the analyst predictions, we would have to increase the use of statistics.
Regional monthly volatility
This line graph shows the average volatility over a 6-month period for three grouped regions, “US” (United States), “China” (China and Hong Kong), and “RoW” (rest of countries). I colored the lines to represent the color of each region’s flag and made the RoW-line grey. Spikes in volatility in April show that this period was volatile for all regions, led by China. In July, the US suffered a relatively strong increase in volatility compared to China and RoW.
While this graph shows detailed snapshots of volatility per month, if we rather wanted to present the volatility trend, we could change the measure to a moving average (Few, 2009).
Event per hour
The fourth graph in this lab report is another line graph, which presents the average frequency of events over the period of one day, measured within time zone GMT -4. There are three prolific spikes in the data at 4 AM, 8 AM and 7 PM. I decided to shade the graph to make the high frequency areas stand out. I also added percentages of the total for the three top peaks, in order to make the diagram more explanatory.
When analyzing the peaks of the graph, I found that they are likely to be connected to the beginning of the day in various time zones, as seen by the labels. One possible takeaway from this is that economic events are often registered during the early hours when markets open. However, this does not mean that the actual events took place during those hours.
REFLECTION & FURTHER DIRECTIONS
I found this lab very educational, especially since the project involved all steps from finding, cleaning and organizing data, to the main objective of this lab; visualizing it.
For further directions, I would suggest performing similar analysis and visualizations for larger economic event datasets, to study e.g. how the proportions of events would differ compared to this rather limited dataset.
Regarding the aesthetics of the visualizations, I would also further work on the dashboard to improve the project overview.