Long Term Winter Precipitation trends: An analysis of snowfall amounts from 1995-2020



This analysis looks to answer if during the period from 1995-2020 snowfall totals in the Western US have decreased more or less than the Northeast, during the period from December to March.  The hypothesis is that the Western US has experienced less change in snowfall than the Northeast.  Global warming has gone from a theoretical idea to reality in the recent past, with every year having more and more superstorms, whether they be in the winter or summer. Interestingly though, there has been little attention paid to the impacts of global warming and if their effects are evenly distributed. The impacts of global warming are most starkly seen in the winter, with snow levels falling and glaciers downsizing rapidly before ceasing to exist.

In US there are two principal skiing destination regions, the northeast, compromising of New York and New England, with New England defined as Vermont, New Hampshire, Maine and Massachusetts, and the West, which encompasses a broad region but for this analysis is defined as Utah, Colorado, Wyoming, Idaho, and Montana. Both of these regions have encountered difficulties in recent years with below average snowfall and while this does not have a significant impact in the northeast, all of the western states rely on snowmelt to fill reservoirs and water crops in the ensuing growing season. Chen et al performed an analysis on changes in snowfall in the Northeast in the last 25 years, but this research focused only on the Northeast and did not look at other regions of the country.  Additionally, Knowles et al analyzed changes in snowfall and rainfall in the Western US, finding that snowfall decreased and rainfall increased in January and March.[2]  Knowles et al worked on a longer time horizon, 1949-2004, offering solid analysis of changes, yet this research needs to be updated. Since it was published in 2006 the climate has continued to change and temperature increases have continued. This analysis looks to bring together this existing research and seeks to answer if during the period from 1995-2020 snowfall
totals in the Western US have decreased more or less than the Northeast.


The data used for this analysis comes from NOAA’s National Centers for Environmental Information.  The data is available in a unique format by month and by state and contains daily precipitation totals for every weather station in the state.  The goal of this investigation is to determine if different regions of the US are being disproportionately affected by using winter precipitation amounts.  This approach was chosen rather than looking at average temperature because often average temperature is not representative of actual conditions in winter alpine climates.  That is to say, often the mercury on thermometers would read at or above 32℉ yet due to windchill or cloud cover snow will still be falling and accumulating to the ground.  Thus, precipitation totals offer a more accurate representation of winter weather. 

Existing datasets from the National Centers for Environmental Information provide the basis for this study.   Taken individually, the month-by-month precipitation data does not reveal much other than how much snow fell, when it fell, and where.  However, when the months are added together to form the 1995-1996 winter season and then all these winter seasons are compared relative to each other, longer term trends begin to become apparent and also help to eliminate any year-to-year variations. 

While for the most part these datasets are without problem, there is an issue with how precipitation totals are reported, with characters being used rather than numbers, M indicating that that day’s report was missed, and T for trace amounts of precipitation which was a day in which something fell but it was not enough to register.  Due to the long-time horizon and broad geographic area this paper is looking at, the decision was made to remove all reports that were missing data as well as all reports of trace precipitation.  Trace amounts were removed because it was impossible to accurately the different small amounts that are subsumed within the trace amount.  This data will be analyzed using a Markov Chain. 


The approach of this paper is purely quantitative, focusing on an analysis of spatial temporal data over a 25-year time horizon.  Weather patterns and trends change very slowly and thus the long-time horizon is necessary to determine if any change is taking place and if so, how significant such change is.  While this is not a longitudinal study, it does have many aspects of one thereby making it an excellent way to test for any changes over time.  A number of tools were used in this analysis, mainly RStudio, both in pulling the data from the National Centers for Environmental Information’s website, reformatting the data from “short form” to “long form” and then in performing analysis on the reformatted data.  As the data was reported daily and then collated into a monthly by state breakdown and there was 25 years each with four months each to pull and then transform, R offered the best solution to this problem. 

The first step was identifying the relevant weather stations and this was done by downloading the global list of weather stations and then filtering down to weather stations in the 10 U.S. states selected for the study.  From here, the data was cleaned and formatted to prepare for pulling the data for each station.  As the data came in a fixed-width-file format, it was beneficial to include both the download and the conversion to a comma separated value format.  Existing code was found and updated to the unique aspects of this investigation.[1]  From there all the monthly files were consolidated into a single file and this file was then filtered down to just the winter months (December, January, February, and March), dates with missing or trace values were replaced with null values, and then this list was joined back to a file containing spatial data (latitude and longitude coordinate pairs).  At this stage the values were entered into the Markov Chain function and then the data was exported and converted into a shapefile that was then merged with shapefile of the U.S. at the county level.  The analysis was then styled using Carto.com’s online tools, a link to the visualization can be found in appendix B.

User testing was done on two separate occasions, with feedback from the first being used to improve the visualization.  I posted my initial visualization on the R/Skiing forum on Reddit.com and asked for general feedback.  The first version of my visualization was a state-level map of the West Coast states with four different maps, one for each of the results from the Markov Chain.  These were static images with no interactivity.  Users liked the breakout of the different trends, but felt that the state level analysis didn’t provide enough detail on what parts of the state were most effected and had trouble interpreting the trends.  Taking this feedback from the user testing into account, two main changes were made; firstly, a second version of the visualization was made at the county level and the decision was made to use an online interactive format as the geographic area was quite large, but users were often interested in small areas, and this gave users the ability to zoom to their personal area of curiosity.  Additionally, more context was added to the map in order to help users understand that the trends were a measure of probability.


Most counties were trending slightly negative to marginally positive (-0.234 – 0.658), implying that precipitation levels are largely staying the same or decreasing.  Counties trending up were generally a slight trend up, with a lower end of 0.07 going all the way to 0.193, containing 91.7% of entries.  The slight trend up helps to explain the overall trend direction.  Trend down was also concentrated in a small area, ranging from -0.121- 0.109.    Volatility at the county level was quite high with 81.5% of counties having at least a 31% change of changing from trending up to down.

County-level snowfall trends, 1995-2020. Analyzed at the monthly interval


The analysis offered a rather muted picture of weather trends in the last 25 years.  Neither the East Coast or the West Coast was definitively snowier.  One challenge from that could use further examination would be fine tuning the mathematical algorithm used for the analysis.  Weather data cannot go below zero, and indeed most days have zero precipitation recorded, thus skewing the result of the Markov Chain.  Another area for extended review would be to run the analysis on the seasonal/yearly level.  This analysis was run at the month level, with values being generated that would infer if the month itself was likely to increase or decrease.  Additionally, this presenting challenged when trying to visualize the analysis.  Finally, a longer time horizon as well as an increased geographic area would provide a more accurate picture of winter weather trends across the U.S.

[1] Rao, M & Battaile, B (2017) R: Reading & Filtering weather data from the Global Historical Climatology Network (GHCN) [Source code]. http://spatialreasoning.com/wp/20170307_1244_r-reading-filtering-weather-data-from-the-global-historical-climatology-network-ghcnd