A Simple Derivation of the Bias-Variance Decomposition
We consider a statistical model paramterised by , which of course, cannot be observed directly and must be estimated through a set of data. We denote our estimated parameter by .
It is useful to have a sense of how close our estimated parameter is to the true value, . One straight forward measure is the so-called Mean Squared Error (MSE) which is the average square difference between the two values
The expectation here is subtle, but note that (under a frequentist view of statistics) the true parameter is fixed and non-random, but the estimated parameter will vary with each new dataset. The MSE reports the square difference of these two, averaged over all possible datasets.
Our goal is to show, with as little effort as possible, the following decomposition
Proof. For any random variable recall that rearranging we have Since this holds for any random variable, replace with so we have The term on the left hand side is the definition of . Similarly, the second term on the right hand side is the definition of the (squared) bias, . Finally, recall that is non-random, so we have . Combining each of these, we arrive at (1).