17.10 Jeffreys-Lindley paradox

Often, Bayesian and frequentist methods yield qualitatively similar results. But sometimes results diverge. A prominent case of divergence is known as the Savage-Lindley paradox. The case is not really a “paradox” in a strict sense. It’s a case where predictions are clearly divergent, and it raises attention for the differences between frequentist and Bayesian testing of point-valued null hypotheses.

Let’s take the following data.

k = 49581
N = 98451

The point-valued null hypothesis is whether the binomial rate is unbiased, so \(\theta_c = 0.5\).

binom.test(k, N)$p.value

## [1] 0.02364686

Based on the standard \(\alpha\)-level of \(0.05\), frequentism thus prescribes to reject \(H_0\).

In contrast, using the Savage-Dickey method to compute the Bayes factor, we find strong support in favor of \(H_0\).

dbeta(0.5, k + 1, N - k + 1)

## [1] 19.21139

The reason why these methods give different results is because they are conceptually completely different things. There is no genuine paradox.

Frequentist testing is a form of model checking. The question addressed by the frequentist hypothesis test is whether a model that assumes that \(\theta_c = 0.5\) is such that, if we assume that this model is true, the data above appears surprising.

The Bayesian method used above hinges on the comparison of two models. The question addressed by the Bayesian comparison-based hypothesis test is which of two models better predicts the observed data from an ex ante point of view (i.e., before having seen the data): the first model assumes that \(\theta_c = 0.5\) and the second model assumes that \(\theta_c \sim \text{Beta}(1,1)\).

For a large \(N\), like in the example at hand, it can be the case that \(\theta_c = 0.5\) is a bad explanation for the data, so that a model-checking test rejects this null hypothesis. At the same time, the alternative model with \(\theta_c \sim \text{Beta}(1,1)\) is even worse than the model \(\theta_c = 0.5\), because it puts credence on many values for \(\theta_c\) that are very, very bad predictors of the data.

None of these considerations lend themselves to a principled argument for or against frequentism or Bayesianism. The lesson to be learned is that these different approaches ask different questions (about models and data). The agile data analyst will diligently check each concrete research context for which method is most conducive to gaining the insights relevant for the given purpose.