8 Statistical models


badge models

Uninterpreted data is uninformative. We cannot generalize, draw inferences or attempt to make predictions unless we make (however minimal) assumptions about the data at hand: what it represents, how it came into existence, which parts relate to which other parts, etc. One way of explicitly acknowledging these assumptions is to engage in model-based data analysis. A statistical model is a conventionally condensed formal representation of the assumptions we make about what the data is and how it might have been generated. In this way, model-based data analysis is more explicit about the analyst’s assumptions than other approaches, such as test-based approaches, which we will encounter in Chapter 16.

There is room for divergence in how to think about a statistical model, the assumptions it encodes and the truth. Some will want to reason with models using language like “if we assume that model \(M\) is true, then …” or “this shows convincingly that \(M\) is likely to be the true model”. Others feel very uncomfortable with such language. In times of heavy discomfort they might repeat their soothing mantra:

All models are wrong, but some are useful. — Box (1979)

To become familiar with model-based data analysis, Section 8.1 introduces the concept of a probabilistic statistical model. Section 8.2 expands on the notation, both formulaic and graphical, which we will use in this book to communicate about models. Finally, Section 8.3 enlarges on the crucial aspects of parameters and priors.

The learning goals for this chapter are:

  • become familiar with the notion of a (Bayesian) statistical model
  • understand the key ingredients of a model:
    • likelihood function, parameters, prior, prior distribution
  • understand notation to communicate models
    • formulas & graphs

References

Box, George E. P. 1979. “Robustness in the Strategy of Scientific Model Building.” In Robustness in Statistics, edited by R. L. Launer and G. N. Wilkinson, 201–36. Cambridge, MA: Academic Press.