Proof. Using the more explicit formulation of likelihood-based regression, we can rewrite the likelihood function in terms of the probability of “sampling” error terms \(\epsilon_i\) for each \(y_i\) in such a way that \(\epsilon_i = y_i - \xi_i = y_i - (X \beta)_i\):
\[ \begin{align*} LH(\beta) & = \prod_{i = 1}^n \text{Normal}(\epsilon_i \mid \mu = 0, \sigma) \\ & = \prod_{i=1}^{n}\frac{1}{\sqrt{2\pi} \sigma} \exp\left[{-\frac{1}{2}\left(\frac{\epsilon_i^2}{\sigma^2}\right)}\right] & \text{[by def. of normal distr.]} \end{align*} \]
Since we are only interested in the maximum of this function, we can also look for the maximum of \(\log LH(\beta)\) because the logarithm is a strictly monotone increasing function. This is useful because the logarithm can then be rewritten as a sum.
\[ \begin{align} LLH(\beta)&=\log \left(LH(\beta)\right)\\ &=-\left( \frac{n}{2}\right) \log(2\pi)-\left( \frac{n}{2}\right) \log(\sigma^2)-\left( \frac{1}{2}\sigma^2\right) \sum_{i=1}^n(\epsilon_i)^2 \tag{2.7} \end{align} \]
Since only the last summand depends on \(\beta\), and since we can drop the factor \(\frac{1}{2}\sigma^2\) for finding a maximum, we obtain:
\[ \arg \max_\beta LLH(\beta) = - \sum_{i=1}^n(\epsilon_i)^2 \]
If we substitute \(\epsilon_i\) and multiply with \(-1\) to find the minimum, we see that we are back at the original problem of finding the OLS solution:
\[ \arg \min_\beta -LLH(\beta) = \sum_{i=1}^n(y_i - (X \beta)_i)^2 \]
Notice that this result holds independently of \(\sigma\), which just canceled out in this derivation.