A Simple Derivation of the Bias-Variance Decomposition

Published

13 July 2025

We consider a statistical model paramterised by $\theta$ , which of course, cannot be observed directly and must be estimated through a set of data. We denote our estimated parameter by $\hat \theta$ .

It is useful to have a sense of how close our estimated parameter $\hat \theta$ is to the true value, $\theta$ . One straight forward measure is the so-called Mean Squared Error (MSE) which is the average square difference between the two values

$\text{MSE}(\hat\theta) := \mathbb{E}[(\hat\theta - \theta)^2]$

The expectation here is subtle, but note that (under a frequentist view of statistics) the true parameter $\theta$ is fixed and non-random, but the estimated parameter will vary with each new dataset. The MSE reports the square difference of these two, averaged over all possible datasets.

Our goal is to show, with as little effort as possible, the following decomposition

$\text{MSE}(\hat\theta) = \text{Var}(\hat\theta) + \text{Bias}(\hat\theta)^2\tag{1}$

Proof. For any random variable $X$ recall that $\text{Var}(X) = \mathbb{E}[X^2] - \mathbb{E}[X]^2$ rearranging we have $\mathbb{E}[X^2] = \text{Var}(X) + \mathbb{E}[X]^2$ Since this holds for any random variable, replace $X$ with $\hat\theta - \theta$ so we have $\mathbb{E}[(\hat\theta - \theta)^2] = \text{Var}(\hat\theta - \theta) + \mathbb{E}[\hat\theta - \theta]^2$ The term on the left hand side is the definition of $\text{MSE}(\hat\theta)$ . Similarly, the second term on the right hand side is the definition of the (squared) bias, $\text{Bias}(\hat\theta) := \mathbb{E}[\theta - \theta]$ . Finally, recall that $\theta$ is non-random, so we have $text{Var}(\hat\theta - \theta) = \text{Var}(\hat\theta)$ . Combining each of these, we arrive at (1). $\square$