C.1 An important family: The Exponential Family

The most common distributions used in statistical modeling are members of the exponential family. Among others:

  • Poisson distribution,
  • Bernoulli distribution,
  • Normal distribution,
  • Chi-Square distribution, and of course the
  • Exponential distribution.

In the upcoming section, some of these distributions will be described in more detail. But what makes the exponential family so special? On the one hand, distributions of this family have some convenient mathematical properties which make them attractive to use in statistical modeling, e.g., the availability of a conjugate prior for Bayesian analyses. Furthermore, the above example distributions are really just examples. The exponential family encompasses a wide class of distributions which makes it possible to model a large number of cases.

On the other hand, the use of distributions from the exponential family is also attractive from a conceptional perspective. For example, suppose that we want to infer a probability distribution subject to certain constraints, such as a coin flip experiment which can have only a dichotomous outcome {0,1} and has a constant probability. Which distribution should be used in order to model this scenario?

There are several possible distributions that can be used. According to which criteria should a distribution be selected? Often one attempts a conservative choice, that is to bring as little subjective information into a model as possible. Or in other terms, one goal could be to select the distribution, among all possible distributions, that is maximally ignorant and least biased given the constraints.

Consequently, the question arises how “ignorance” can be measured and distributions compared according to their “information content”. This chapter explores these questions based on information-theoretic notions such as “entropy”, and the “Maximum Entropy Principle”. As we will see here, distributions belonging to the exponential family arise as solutions to the maximum entropy problem subject to linear constraints.