A sequel to my rebuttal to an anti-log manifesto:
Reader Ed Hanson:, commenting on my use of correlation coefficients in analyzing economic policy uncertainty comovements.
My lessen 1 is the danger to statisticians who try to create more precision from the dat than actually exist. Just because you can calculate a number does not mean that number has precision. In other words, Menzie, you are attempting to get more precision out of the Index than actually exist. My visual and general note of spikes of uncertainty has more meaning and correlation than your over mathematical treatment which feigns accuracy.
It is an easy refuge for people to criticize statistical analysis as “lies, damn lies, and statistics”. However, in doing statistical analysis on economic policy uncertainty indices, I think it is useful to distinguish between the data series themselves, and moments of the
series data sample (text edited 6:33PM).
First, the EPU is compiled (roughly speaking) using occurrences of the words of “economic”, “policy” and “uncertainty” from newspapers, normalized by total words. One can take issue with the creation of these indices, in terms of there coverage, in terms of normalization, whether the numerical tabulation is representative of actual economic policy uncertainty (I leave it to Baker, Bloom and Davis (2016) to make the case). It’s the use of correlation coefficients to summarize comovement that exercises Mr. Hanson.
So, second, let’s look at the idea of measuring comovement. The Pearson correlation coefficient I cited is estimated as:
ρxy = Cov(x,y)/(Var(x)×Var(y))0.5
Now, a correlation coefficient can always be calculated. There is a deep question of whether there is a population correlation that the estimated correlation coefficient converges to. Of course, that question relates to other parameters one might want to estimate. If for instance x and y are I(1) series, and not cointegrated, then a regression coefficient will not converge to a population parameter because it doesn’t exist.
Still, as a summary of how two observed series comove, the correlation coefficient is useful.
Is my reporting of a correlation coefficient misleadingly precise? Yes, if the underlying indices are reported to three significant digits, and I had reported the correlation coefficient up to 10 significant digits. As it turns out, the US baseline EPU is reported to 16 significant digits, I listed correlations to up to three significant digits.
Now, I could be faulted for not reporting statistical significance, with respect to some interesting null, like ρ=0. Below I show a correlation table with t-stats, for EPU’s over the 2016M01-2018M03.
Notice every case where the ρ exceeds 0.707 (consistent with R2 in bivariate context of 0.5), the t-statistic exceeds (in absolute value) 2, signifying statistical significance at the 95% level.
The data are plotted below.
Are there problems with correlation coefficients as summary measures of comovement? Sure. If there are multiple regimes, then a simple correlation coefficient (or a single regression coefficient) might lead to misleading results when a full-sample estimate is applied to a subsample where one regime dominates.
For instance, if there is a high correlation regime and a low correlation regime, then one should use a regime switching procedure (e.g., a Markov-switching procedure). Of course, once again, one would be estimating coefficients which might yield the feeling of precision … to those who are unaware of the uses of statistical analysis. But this is more likely the right way to proceed if one has the feeling that looking at “spikes” is the way to go.
Another, perhaps more relevant, case is when the variances vary over time (i.e., there is heteroskedasticity), but the underlying regression coefficient relating y to x is constant. As Forbes and Rigobon (2002) point out, in such instances, increases in variances will manifest in increases in correlations. So maybe regression coefficients were the way to go. Unfortunately, regression coefficients are not invariant to which variable is placed on the left or right hand side…. so one has to sort through a large number of results (and make a different judgement on what is a large comovement).
In any case, the use of correlation coefficients, in the hands of an informed analyst, seems to me a helpful way to summarize the comovement of variables, and does not necessarily lead to an over-estimate of precision in and of itself.
Now, I’ll go back to defending the use of logn(.), aka ln(.).