17.8 Three pillars of data analysis

There are three main uses for models in statistical data analysis:

  1. Parameter estimation: Based on model \(M\) and data \(D\), we try to infer which value of the parameter vector \(\theta\) we should believe in or work with (e.g., base our decision on). Parameter estimation can also serve knowledge gain, especially if (some component of) \(\theta\) is theoretically interesting.
  2. Model comparison: If we formulate at least two alternative models, we can ask which model better explains or better predicts some data. In some of its guises, model comparison helps with the question of whether a given data set provides evidence in favor of one model and against another other, and if so, how much.
  3. Prediction: Models can also be used to make predictions about future or hypothetical data observations.

The frequentist and the Bayesian approach each have their specific methods and techniques to do estimation, comparison, and prediction. Even within each approach (frequentist or Bayesian) and a particular goal (estimation, comparison, or prediction) there is not necessarily unanimity about the best method or technique.

Table 17.2 lists the most common/salient methods used for each goal in the frequentist and Bayesian approach, as discussed in the previous chapters.

Table 17.2: Most common/salient methods of frequentist and Bayesian approaches for the three major goals of model-based data analysis. The abbreviations used are: MLE for ‘maximum likelihood estimate’, AIC for ‘Akaike information criterion’, LR-test for ‘likelihood-ratio test’ and \(D_{rep}\) for ‘repeat data’.
inferential goal target frequentist Bayesian
estimation \(\theta\) MLE: \(\hat{\theta} = \arg \max_{\theta} P_M(D \mid \theta)\) posterior: \(P_M(\theta \mid D)\)
comparison \(M\) AIC, LR-test Bayes factor
prediction \(D\) MLE-based: \(P_M(D_{rep} \mid \hat{\theta})\) Posterior-based: \(P_M(D_{rep} \mid D)\)

The three pillars of data analysis mentioned above are tightly related, of course. For one, model comparison is often parasitic on prediction: whereas prediction asks which data is to be expected, given the model, model comparison looks at how well a given data set is or would have been predicted by different models. For another, parameter estimation and data predictions are something like each others’ reverse operations.