David Romer: “In Praise of Confidence Intervals”

That’s the name of a new NBER working paper, and a paper presented at ASSA (slides; I didn’t get to see myself). As I get to teach applied econometrics for public policy this semester, I thought this was interesting paper. From the abstract:

Most empirical papers in economics focus on two aspects of their results: whether the estimates are statistically significantly different from zero and the interpretation of the point estimates. This focus obscures important information about the implications of the results for economically interesting hypotheses about values of the parameters other than zero, and in some cases, about the strength of the evidence against values of zero. This limitation can be overcome by reporting confidence intervals for papers’ main estimates and discussing their economic interpretation.

This short paper is particularly relevant, given the debate over “p-hacking” (which was ongoing when I last taught this class — I think I referred to this paper). Specifically:

…a common approach to discussing empirical results often leaves out important information about the results’ implications. At a general level, addressing this omission is straightforward. What is need is information about the implications of the results for hypotheses of interest. And although there are various ways of providing this information, a natural one is to report and discuss a confidence interval. In contrast to reporting a point estimate and whether it is statistically significantly different from zero, reporting a confidence interval provides information about the full range of possible values of the parameter.

One example in the paper, macroeconomically oriented:

…the fiscal multiplier—the short-run output effect of a one-unit increase in government purchases with the economy in a specific set of circumstances (for example, with output below normal and with a particular assumption about the behavior of monetary policy). There are some models where the multiplier is zero (notably, flexible price models with inelastic supply), so here the null hypothesis of zero is an interesting one. But other values are also potentially important. One focal value is a multiplier of one, which is both the value predicted by some models under certain conditions (Woodford 2011) and the boundary between stimulus increasing or decreasing private economic activity. A policymaker designing a stimulus package might be interested in comparisons with results from prior work about multipliers for various types of tax cuts. And, as with the return to education, values obtained in previous work, such as the figure of 1.8 suggested in a recent survey of cross-sectional research by Chodorow-Reich (2019), are also of interest. Again, a focus on the point estimate and whether it is statistically significantly different from zero is misplaced if a reader’s interest is in knowing what the evidence tells us about any of these various other possible values.

Still, it’s important to remember what confidence interval is, as there is much confusion on this issue. It is an interval that:

“Were this procedure to be repeated on numerous samples, the fraction of calculated confidence intervals (which would differ for each sample) that encompass the true population parameter would tend toward 95%.”

One recent discussion of confidence intervals is, on this blog, is here. One example by a commenter on this blog of misinterpretation (which I will be using in class) is here.



20 thoughts on “David Romer: “In Praise of Confidence Intervals”

  1. Moses Herzog

    I’m pretty careful on this blog to make sure when stating objectively measured things that my comments are factual. However, I must confess I held my breath for a couple seconds when clicking that very last link.

    : )

  2. Not Trampis

    Chinny ( an endearing aussie nickname,)

    your link for what a confidence interval is and your last link are the same.

    1. Moses Herzog

      Hoping you and your family are safe from the ravages of fire and Mother Nature in whatever region of “the great southern land” you reside.

      Remember, no matter how masochistic the American electorate is in electing our very rude President, we still care about our friends in the Lucky Country.

  3. Julian Silk

    Dear Folks,

    This post will probably get a lot of comments, because people want to believe a lot of things about confidence intervals regarding government multipliers. Liberals usually want multiplier estimates to be very high, except for military spending. Conservatives usually want them to be low or negative, so as to argue against big government. These positions often get completely reversed when the government stimulus is in the form of tax cuts, and it is often very hard to get the model elasticities to see what backs it up. At the ASSA meetings, there was an argument years ago that the multiplier for Germany was 0.3, which turned out to be based on West Germany absorbing East Germany, including the government performance, so that the West Germans (who were considered “Germany”) did indeed pay and got nothing tangible. This is not always the case.

    Confidence intervals don’t mean you always wind up with them. Bob Trost emphasized to me the right way to look at them. If your residuals (remaining errors in matching the actual data) are normal (Gaussian), as they should be, for a 95% confidence interval, 19 times out of 20 you are in it. 1 time out of 20 you will not be in it. But confidence intervals don’t rescue bad models by themselves. You want to look at how the residuals behave, what are the outliers, and how important they are in affecting the results, etc. Autocorrelation (an undiagnosed systematic behavior in the residuals) shows up very frequently for macroeconomic series, and that is very important.

    1. 2slugbaits

      Julian Silk I don’t see anything that’s particularly controversial in your post. I would hope that most folks reading this blog would understand that fiscal multipliers are always conditional on certain assumptions about how the central bank will react, whether or not the macroeconomy is in a deep recession, propensity to import, etc. Fiscal multipliers are not black magic; they simply capture what happens when people choose to spend more of their income instead of putting it in a mattress. It should also be pretty obvious why spending multipliers will always be larger than tax cut multipliers, ceteris paribus. The econometric difficulty that never gets much academic attention is nailing down “government spending.” Is it when Congress authorizes the Departments to spend? Is it when the Departments put authorized funds on contract? Is it when the Departments disburse funds after the contractor has completed production? Is it when the prospective contractor begins hiring, arranging for bank loans and ordering raw materials in anticipation of a contract award?

      There is a common misunderstanding of confidence intervals. Many people get things backwards and treat the sample mean as the true population mean and misunderstand a confidence interval as meaning the true population mean will jump around the sample mean with the next experiment or draw. As you say, the true population mean is unknown, as is the true population variance; that’s why we have to estimate them! A confidence interval simply means that if we repeated the model 100 times, then we should expect the raw sample data that we pull each time will vary somewhat with each draw. As a result the sample mean and sample variance will be different each time. And that means that the confidence limits we calculate will be slightly different each time. A confidence interval of 95% simply means that the true population mean should fall somewhere within 95% of those individual confidence limits. In this day and age there probably isn’t any excuse for researchers not testing for autocorrelation, heteroskedasticity, outliers, normality, etc.; however, in the real world you rarely concoct a model that doesn’t have some problem. As a result, confidence intervals are likely to be overly optimistic.

      1. Moses Herzog

        I think Julian’s comment is one of the better comments. If I was to be overly nit-picky though, I think it’s slightly unfair to think people always interpret data/statistics by their own political leanings. It’s a general condition of society, not an absolute among all individuals. I wish I could think of a good example, but I can’t at the moment. But I do know I have accepted certain facts (and maybe even discussed them on this blog before) which I find rather aggravating or contradictory to my own feelings on a political topic, but nonetheless consider facts. If I think of them later I will put them in this thread, but in all honesty I’m blank at the moment.

        OK, off hand I just thought of one, The Philips Curve. I think most likely it holds true—but from a subjective/political standpoint I don’t like the fact that it is probably true. I will try to think of some others later and put them in the thread.

  4. Rick Stryker


    Your bias is showing yet again. You attack conservative-leaning commenter Steven Kopits and make him an example in your course since Steven used the confidence interval as a guide to post-sample parameter variability, which is pretty much a universal practice. David Romer is proposing to to use the confidence interval as a guide to post-sample parameter variability. Will Romer also be an example in your course? I bet not. Always the double standard.

    1. Barkley Rosser


      But he does not. You are wrong. Read his comment below. He does not like confidence intervals at all.

      1. Rick Stryker


        I was referring to the point that Steven made that Menzie said he’d use in his class as an example of how not to interpret the confidence level.

        Steven also made clear that he doesn’t like confidence intervals.

  5. Barkley Rosser

    I agree with the main thrust of the post here, which I take to be that looking at confidence intervals may be more useful than simply looking at significance levels.

    I want to mention something that has come up here before and is mentioned above, the matter of outliers. I think researchers have gotten better on this, but there used to be a practice of throwing out outliers in empirical testing. The only valid reason to do so is if one thinks the outlier is such because it is erroneous. Otherwise outliers should not be eliminated. Indeed, some of the most important economic events are outliers, e.g. the 22% decline of the Dow on Oct. 19, 1987 (or thereabouts).

  6. Leopold

    This lengthy article is worth reading :


    The first part explains why most credentialed macroeconomists are clueless, and why they aren’t penalized for being wrong again and again (such as believing that QE will cause inflation, or that low unemployment will cause inflation….any day now…).

    The second part explains the negative economic effects of misandry (aka ‘feminism’).

    I have never seen an article that more thoroughly refutes all of Menzie Chinn’s beliefs.

  7. Steven Kopits

    As confidence intervals are commonly used, they mean that the population mean will fall between the upper and lower boundaries 95% of the time. This is, of course, a little different than the strict statistical interpretation, because we don’t actually know what the population mean is. The concept was used in the commonly understood manner in the Harvard Study, to wit, from your own post:

    University of Puerto Rico statistician Roberto Rivera, who along with colleague Wolfgang Rolke used death certificates to estimate a much lower death count, said that indirect estimates should be interpreted with care.

    “Note that according to the study the true number of deaths due to Maria can be any number between 793 and 8,498: 4,645 is not more likely than any other value in the range,” Rivera said.

    I don’t like confidence intervals for two reasons:

    First, the word ‘confidence’ can be misleading. If the underlying data is bad, if the survey method is weak, if respondents lied, if the sample is not random (but not known to be so), and if the analyst cherry picks data, then the confidence interval can be wildly misleading. If the public reads ‘a 95% confidence interval’, then they think that surely the true mean must be within that interval in all likelihood. But that’s not what the CI means if any of the above mentioned conditions — bad data, bad survey, dishonest statistician — pertains. The only thing is says is that the CI for the calculation for data as it exists and was selected for inclusion yields that particular CI.

    The second problem is that a 95% confidence interval is often not actionable. In the example about, the range is from 800 to 8,000. So does one send out the dogs and excavators or not? Can’t tell from the CI. In the real world, CI of this size are all but useless most of the time.

    And of course, in a normal distribution, the mean is not only the central value, but also the most likely value.

    1. Menzie Chinn Post author

      Steven Kopits: Wow. You write:

      I don’t like confidence intervals for two reasons:

      First, the word ‘confidence’ can be misleading. …

      Well, “confidence interval” is a technical term entrenched in the literature. I don’t like the term “moral hazard”, pretty misleading in my opinion, but I’m not going to dispense with it in my classes because of my opinion.

      “Not actionable”? Then better to not provide any technically defined measure of measurement error?

      Thanks again for providing a case study for PubAffr 819!!!

      1. Steven Kopits

        So, you can add this.

        Were I heading the Harvard study, I would not have used confidence intervals at all. I would have said to my staff:

        “So, we’re using an experimental approach that may or may not work. We don’t know if our sample is truly random, and we don’t know if our status as outsiders has affected the frankness of answers given. We think that respondents may have a financial stake in the answers they give. We know that three people in 100 are pretty much nuts, and that we’re trying to find events — deaths — that are comparatively rare, even under the current circumstances. If even 10 people in 4,000 embellished their answers, lied to us because we’re outsiders who came from Harvard and didn’t like us, or believed that they would receive death benefits based on their answers, then we’re going to be way off. So let me help you here. I want this thing caveated to hell and back. This is a good try, an experiment to see if our findings ultimately line up with the recorded mortality data. But remember, our credibility is at risk here. Either people died, or they didn’t. If they did, they are going to show up — every one of them — in the data. If we’re wrong, we’re going to be wrong is a very public way. So, let’s put it out there as a new technique, but let’s not oversell it. Truth is, we don’t know if a survey approach works, and won’t for several months. Let’s put the findings out there, but I don’t want to see anything with the word ‘confidence’ in it.”

        1. pgl

          “We know that three people in 100 are pretty much nuts”.

          What evidence do you have that 3% of the residents of Puerto Rico are “nuts”? Can you put a confidence interval around this estimate of yours? Oh wait – you just pulled that number out of your racist rear end. Sort of like how John Lott does statistics. Yea – your data is “bad” data as you would put it. But of course you would look really bad if people held your intellectual garbage parading as Princeton Policy Analysis to the “standards” you hold for other people’s research.

    1. 2slugbaits

      Steven Kopits I’m not an oil guy, but my understanding of the consensus view is that oil prices follow a random walk, and that’s certainly what a simple ADF unit root test of the data tells you. So you shouldn’t be surprised that the forecast or prediction intervals widen as the forecast horizon moves out. And that gets to my second point. The EIA chart actually showed a prediction interval, not a confidence interval. They aren’t the same thing, although it’s common practice to call both of the “confidence intervals.” A prediction interval refers to a random variable that has not yet been observed. A confidence interval refers to the interval that encompasses the true population parameter. You might find this interesting:

      1. pgl

        “The EIA chart actually showed a prediction interval, not a confidence interval.”

        The EIA graph did clearly indicate it was forecasting into 2021. How much you want to be that Stevie does not realize that it currently January 2020!

    2. pgl

      Gee oil prices are volatile and hard to predict. Gee Stevie – tell us something else we do not know such as 2 + 2 = 4. We’ll wait until you take your shoes off so you can actually count that high.

    3. Barkley Rosser


      Yes, that is “completely uninvestable.” So what? I realize that is only a problem for someone running a consulting firm who wants to convince sucker clients that he knows what is going on and deserves to be paid for that wonderful knowledge. But the hard fact is that indeed there is a very wide range of not completely unlikely possible outcomes.

Comments are closed.