11.3.1: Mean and Variance
( \newcommand{\kernel}{\mathrm{null}\,}\)
Suppose you have a set of values aj. By saying that this is a set, we mean that we have several values a1, a2, a3, and so forth. The notation aj, in this context, means that j can be replaced by any integer between 1 and the total number of values that you have in order to refer to that specific value. Suppose that we have N total values. The average of all of our values can be written as:
⟨a⟩=1N∑jaj
The letter Σ is the capital Greek letter “sigma”. This notation means that you sum together all of the values of aj that you have. For instance, suppose you had just four values, a1,a2,a3, and a4, then:
∑jaj=a1+a2+a3+a4
Therefore, the mean (or average) value of a in this context is:
⟨a⟩=1N∑jaj=1N(a1+a2+a3+a4)
To quantify the uncertainty on a set of values, we want to say something about how far, on average, a given value is from the mean of all the values. Thus, it’s tempting to try to define the uncertainty as follows:
1N∑j(aj−⟨a⟩)
Remember that addition is commutative. Realizing that the ∑ symbol just indicates a sum, i.e. a whole lot of addition, we can rewrite this as:
1N(∑jaj−∑j⟨a⟩)
The second term in the subtraction is a sum over j of the average value. The average value doesn’t depend on which aj we’re talking about; it’s a constant, it’s the same for all of them. Therefore, the sum of that number N times is just going to be equal to N⟨a⟩. Making this substitution and distributing the 1/N into the parentheses:
1N∑jaj−1NN⟨a⟩
But we recognize the first term in this subtraction as just ⟨a⟩. So, the total result of this is zero. Clearly, this is not a good expression for the uncertainty in a. If you think about it, the average deviation of aj from ⟨a⟩ ought to be zero. If ⟨a⟩ is the average value of a, then aj should be below ⟨a⟩ about as often as it is above, so your sum will have a mix of positive and negative terms. The very definition of the average insures that this sum will be zero.
Instead, we shall define the variance as:
Δa2=1N∑j(aj−⟨a⟩)2
Here, we’re using Δa to indicate the uncertainty in a. The variance is defined as the uncertainty squared.1 The advantage of this expression is that because we’re squaring the difference between each value aj and the average value, we’re always going to be summing together positive terms; there will be no negative terms to cancel out the positive terms. Therefore, this should be a reasonable estimate of how far, typically, the measurements aj are from their average.
We can unpack this sum a bit, first by multiplying out the squared polynomial:
Δ2=1N∑j(a2j−2⟨a⟩aj+⟨a⟩2)
In order to clean this expression up, inside the parentheses both add and subtract ⟨a⟩2:
Δa2=1N∑j(a2j−2⟨a⟩aj+2⟨a⟩2−⟨a⟩2)=1N∑j(a2j−⟨a⟩2+2⟨a⟩(⟨a⟩−aj))=1N∑ja2j−1N∑j⟨a⟩2+1N2⟨a⟩∑j(⟨a⟩−aj)
Notice that the last term is going to be zero, as it includes the average difference between the mean and each observation. The second term is just going to be ⟨a⟩2, because once again ⟨a⟩ is the same for all terms of the sum; the sum will yield N⟨a⟩2, canceling the N in the denominator. So, we have:
Δa2=⟨a2⟩−⟨a⟩2
1If you know statistics, you may recognizing this as being very similar to how variance is defined there— only in statistics, we divide by N−1 rather than by N. The difference becomes unimportant as N gets large.