Winter is Coming: An Analysis of Sunshine vs Depression

By Justin Fenn

As someone who regularly battles depression, when the winter arrives and most are thinking of curling up by a fire or the holidays, I am mentally preparing for what I know will be the battle of the year. A theory behind this is that my primitive ancestors from Northern Europe evolved this trait of seasonal "depression" as a way to reduce energy expenditure in the winter in order to survive. Yet, in our modern world of year around productivity, this once useful trait is now a major hindrance. As they say, "Winter is coming".

Data Wrangling

In order to provide an analysis, I gathered sunshine data from NOAA and health data from the Global Health Data Exchange. I grouped them by state because of the data I could find on mental health data was primarily based on state.

NOAA Sunshine Data

I grouped the NOAA sunshine data by state by averaging the different reporting stations in a state. The column 'percent_sunshine' refers to percent sunshine out of the possible sunshine (i.e. how much of the day is sunny on average). This data is an average over a certain amount of years depending on the when weather station was put online.

GBD Health Data

I used data from the Global Health Data Exchange in order to get the depression rates by state. Here is a permalink to the data.

Merged Data

Now we merge the data to get a table of state, depression rate, and sunshine percentage.

Data Analysis and Visualization

Now that we have the data that we want, we should visualize the data and see if we can see a pattern. To do this, we can make a scatter plot showing the relationship between sunshine percentage and depression rates in each state.

From our scatter plot, it seems like states with a higher depression rate also have a more extreme sunshine percentage (either very high or very low). To further show this, let's group the states by depression rate into 6 bins and show a violin plot of each bin.

As can be seen in this violin plot, when there is a higher variance of sunshine percentage (i.e. extreme highs and lows) the state is more likely to have a higher depression rate.

Another way to visualize this data is to take the difference between the percent_sunshine and the mean of percent_sunshine. This way we can see the relationship of living in more extreme sunshine percentage a bit clearer.

Conclusion

My hypothesis that there is a linear correlation between sunshine and depression rates did not hold true, but from this analysis there was an insight. It seems that the old adage of "All good things in moderation" seems to apply to sunshine as well. An average amount of sunshine does not correlate to a high or low rate of depression, but a very low or very high amount of sunshine does seem to correlate to a high incidence of depression in a state.

In a future analysis, I would like to take more granular data of sunshine and mental health data since a state can be large and have many different sunshine percentages and depression rates across it. Unfortunately, for the scope of this analysis the datasets required for this analysis would have needed much more data tidying.

In conclusion, if you struggle with depression, you may have better luck in a more moderately sunny state or the one outlier: California, where the rate of depression is relatively low but its much more sunny on average than most states.