# 1.3: Probability & Statistics

- Page ID
- 16516

It is assumed that the reader has a basic understanding of probability theory. Here we seek to review only the parts that will be most needed for what will follow in quantum theory.

## Probability Distributions

Whenever we consider an outcome to a random event, the probability of that outcome is the ratio of its "measure" to the measure of all possible outcomes. Frequently this measure is computed purely by counting, and is particularly simple if each outcome is equally likely. Sometimes the counting is a bit more complicated, because the desired probability spans a group of distinguishable results. For example, if one asks for the probability of throwing a 7 on a standard pair of 6-sided dice, this "outcome" can occur in many ways: The first die can result in 1 and the second in 6, the first die 6 and the second 1, the first 5 and the second 2, and so on. There are 6 distinct rolls that result in this same outcome, giving it a measure of 6. The total number of possible rolls is the product of the number of possible results on the first die and the number of possible results on the second die, or 36. The ratio of the measure of the desired outcome to the measure of all outcomes is 6/36 = 1/6.

If the outcome of a roll of two dice is defined as the total on both dice, then all outcomes are not equally-probable. For example, there are more ways to roll a total of 6 (five ways) than a total of 5 (four ways). The map of probabilities for the various possible outcomes is called the *probability distribution*. For two dice, there are eleven possible outcomes, and the probability distribution for these outcomes are shown below.

**Figure 1.3.1 – Probability Distribution for Sum of Two Six-Sided Dice**

This probability distribution involves different probabilities for different outcomes. Those (like for the roll of a single die) that provide the same probability for all outcomes are called *uniform*. Quite often when one knows of no mechanism that would cause results to clump into certain outcomes, the first guess at the probability distribution is uniform. This assessment can change either with increased knowledge of the randomizing mechanism (e.g. a single die is found to be weighted unsymmetrically), or data indicating that the original assumption was likely bad (e.g. a single die comes up one number much more often than random chance would indicate).

There is no reason why all probability distributions must be discrete as it is for two dice. Probability distributions on a continuum are also possible. The probability of a blindfolded dart thrower hitting various positions on a dart board could be an example of a two-dimensional continuous probability distribution.

## Normalization and Probability Density

A universal truth of probability theory is that when the result of a random event occurs, it must land within the universe of possible outcomes. Mathematically, this means that the sum of the probabilities of all possible outcomes must be 1. This can be confirmed for the case of the roll of two 6-side dice by summing all of the probabilities in Figure 1.3.1.

What distinguishes the various probabilities from each other is their *relative* measures. In the example of the two dice, the probability of throwing a 7 is twice as great as throwing a 4 or a 10. We can determine these measuress by comparing the number of ways the results can occur (six ways for the 7 versus three ways for the 4 and 10), but if we want to be able to properly use the probability distribution, we must divide all these measures by a the sum of all measures so that the new sum is 1. This process is called *normalization*.

If we have a continuous probability distribution (of any dimension), then the measure for any individual result is actually zero, as there are infinitely-many possible outcomes. However, this doesn't make all the outcomes equally likely, because they may have different relative measures. For example, if the probability of one outcome is \(p\) and the probability of a second outcome is \(3p\), then the ratio of these outcomes shows that the latter outcome is three times more likely than the former, even in the limit as \(p\) goes to zero. Also, the sum of the infinite number of zero-probability outcomes still must equal one. We assure that this works properly by representing the continuous probability distribution with a *probability density function*.

As with any other density function we have encountered (such as mass density and charge density), the idea is to measure the relative weightings at various positions. For a line of mass along the \(x\)-axis with a mass density of \(\lambda\left(x\right)\), the infinitesimal amount of mass found in the tiny slice between positions \(x\) and \(x+dx\) is given by \(dm=\lambda\left(x\right)dx\).

**Figure 1.3.2 Amount of Mass In an Infinitesimal Section in Terms of Density**

Now imagine that instead of a line of matter with varying mass density, we were talking about a particle bouncing back-and-forth within an opaque tube. The particle could be anywhere within the tube, and its probability of being between \(x\) and \(x+dx\) is infinitesimally small. But we can describe the probability of it being in that region in terms of the probability density function \(P\left(x\right)\) in the same way as we did for mass:

\[Prob\left(x:x+dx\right) = P\left(x\right)dx \]

The normalization condition requires that:

## Expectation Value

In a practical sense, probability theory is all about prediction. Not only do we use it to predict the likelihood of various events, but perhaps more important is using it to compute an *average* outcome. This is best shown with an example. Casinos offer odds in the dice game of craps (played with two six-sided dice) where they will pay $15 for a $1 wager on the next roll equaling exactly 3. Suppose a wealthy-but-deranged philanthropist offers to make $100 bets for you on the total 3. How much is each roll worth to you, on average? Since each outcome is equally likely, we can imagine rolling 36 times and getting each outcome exactly once (this is extremely unlikely to happen, of course, but we are using it as a device to do this calculation). If we add up our winnings for those 36 rolls and divide by 36, we have the value of each roll. Well, the 3 will come up twice during those 36 rolls (once when the dice come up 2-1, and once when they come up 1-2), so you win $3,000 for those rolls, which works out to be $83.33 per roll. [Naturally, this comes out to be less than the $100 cost of each roll. There's a reason casinos make so much money!]

If we look closely at how this calculation worked out, we see a shorter way to do it. If we multiply the probability of each outcome by the value of that outcome, and add all these numbers together, we get the same answer. For the roll of a 3, the value is $1,500, and the probability is (as shown in Figure 1.3.1) equal to \(\frac{1}{18}\). For every other roll the value is zero, so the remaining terms in our sum don't contribute anything, and we get our same result: \(\frac{1}{18}\left($1,500\right)=$83.33\).

Averaging a quantity in this way results in what is known as an *expectation value* of that quantity. Written out mathematically, calling the quantity being averaged \(\Omega\) (in the example above, this would be dollars won), it looks like:

\[ \left<\omega\right> = P_1\omega_1 + P_2\omega_2 + ... =\sum \limits_{i=1}^N P_i\omega_i ,\]

where \(P_i\) is the probability of the \(i^{th}\) outcome, and \(\omega_i\) is the value of the quantity for that outcome.

If there are infinitude of possible outcomes because they are distributed on a continuum, then this sum turns into an integral, and the probability density is used in place of the \(P_i\)'s above:

\[ \left<\omega\right> =\int \limits_{-\infty}^{+\infty}P\left(x\right)\omega\left(x\right)dx \]

It is important to note that the expectation value is, in statistics terms, the *mean* of the distribution (as opposed to the mode and median, two other statistical measures of the "center" of a distribution), which means that this value is not necessary one of the possible outcomes. Indeed, in the example above, the only possible outcomes were $0 and $1,500, a set of results which doesn't include the mean.

*A block vibrates on a frictionless horizontal surface while attached to a spring with spring constant \(k\). The maximum distance that the mass gets from the equilibrium point is \(x_o\). A radar gun measures the speed of the block at many random times, and these speeds are combined with the mass of the of the block to compute the block's kinetic energy. Find the average kinetic energy measured.*

**Solution**-
*There are several ways to approach this. We will take the brute-force method here, to emphasize the mathematical details of the probability density integral. We start by determining the probability of the block being between \(x\) and \(x+dx\) at any random moment (with \(x\) measured from the equilibrium point of the spring). First, it should be clear that the probability density is not uniform – the block spends longer near the extreme ends of the oscillation than near the center, because it is moving slower near the endpoints. The probability of being in the tiny range \(dx\) will be the ratio of the time it spends there (which we'll call \(dt\)) to the time it spends going from one end of the oscillation to the other (half a period, \(\frac{1}{2}T\)):**\[P\left(x\right)dx = \dfrac{dt}{\frac{1}{2}T} \;\;\; \Rightarrow \;\;\; P\left(x\right)=\left(\dfrac{2}{T}\right)\dfrac{dt}{dx} = \dfrac{2}{vT} \nonumber\]**Plugging this into the expectation value equation for kinetic energy gives:**\[ \left<KE\right> = \int \limits_{-x_o}^{+x_o} P\left(x\right)\left[\frac{1}{2}mv^2\right]dx = \dfrac{m}{T}\int \limits_{-x_o}^{+x_o} v\;dx \nonumber \]**Clearly the velocity of the block changes with respect to \(x\), so \(v\) cannot be pulled out of the integral. The function of \(x\) that we plug in to \(v\) is found by noting that the total energy of the system remains constant, and equals the potential energy at the extreme points of the oscillation:**\[E=\frac{1}{2}mv^2+\frac{1}{2}kx^2= \frac{1}{2}kx_o^2\;\;\; \Rightarrow \;\;\; v\left(x\right) = x_o\sqrt{\dfrac{k}{m}}\sqrt{1-\left(\frac{x}{x_o}\right)^2} \nonumber\]**Plugging this into the integral and making the substitution \(u \equiv \frac{x}{x_o}\) gives:**\[ \left<KE\right> = \dfrac{mx_o^2}{T}\sqrt{\dfrac{k}{m}}\int \limits_{-1}^{+1} \sqrt{1-u^2}du \nonumber \]**The reader that wants to do every step of the math can perform the integral with a trig substitution, but looking it up is also fine – it comes out to equal \(\frac{\pi}{2}\). All that remains is to use the period of oscillation for this simple harmonic oscillator in terms of the mass and spring constant:**\[ T=2\pi\sqrt{\dfrac{m}{k}} \;\;\; \Rightarrow \;\;\; \left<KE\right> = \frac{1}{4}kx_o^2 \nonumber \]**Note that the average kinetic energy is half the total energy, which means the average potential energy is the same – on average the energy is split evenly between the two modes.*

## Standard Deviation (Uncertainty)

The expectation value gives us an average result, but when several samples are taken, one is often also interested in how much the results spread themselves out. If there is only one result possible, then that result is the mean, and there is no spread at all. On the other hand, there could be two or more results that are very far apart, but their mean comes out the same as the one-result case. It is useful to have a well-defined measure of how spread out the results are. This measure is called *standard deviation*, and we will also refer to it in our quantum-mechanical context as *uncertainty*.

The computation of standard deviation goes like as follows. [Note: We will start (as always) with the case where there are discrete results, and then generalize to the case where a probability density is needed.]

First, we need to know how far each possible individual result, \(\omega_i\), is from the mean of all the results, \(\left<\omega\right>\):

\[ separation\;of\;i^{th}\;result\;from\;mean = \omega_i-\left<\omega\right> \]

We would like to know the "average separation" over all the results, but how do we define such an average? If we just take an average of the differences given above, it would come out to equal zero. The proof is easy:

\[ \left< \omega_i-\left<\omega\right>\right> = \dfrac{1}{N}\sum \limits_{i=1}^N\left[\omega_i-\left<\omega\right>\right] = \dfrac{1}{N}\sum \limits_{i=1}^N\omega_i - \dfrac{\left<\omega\right>}{N}\sum \limits_{i=1}^N \left(1\right) = \left<\omega\right>- \left<\omega\right>=0 \]

The problem is that these separations are both positive and negative, but to measure the spread, we don't care in which direction the deviation from the mean is. We could define the average deviation of the results from the mean as the average of the absolute values of the separations, but for rather mathematically complex reasons, it turns out that this is not the best definition. We won't go into details here, except to say that it is more useful to give a higher weighting to deviations as they get farther from the mean (the absolute value method weights all deviations equally).

The "standard" deviation that we calculate also removes the problem of negative deviations, but also weights separation from the mean more as it becomes greater. It does this by *squaring* the separation of every result from the mean, averaging those squares, and then taking the square root of the sum. In other contexts where the mean is clearly zero (such as the current in an AC circuit), this is a measurement of the average magnitude of the value, and is often referred to as the *root-mean-square*, or *rms* value, for reasons that are obvious now that we know how it is calculated.

Let's summarize the calculation of standard deviation before writing out the formula.

- Start with the full set of outcomes, \(\omega_i\), and their accompanying probabilities, \(P_i\).
- Calculate the mean (expectation value) of the outcomes, using Equation 1.3.3.
- Calculate the separation of every outcome from the mean, using Equation 1.3.5.
- Square all of these separations.
- Find the mean of all these squares (add them all together and divide by the total number).
- Take the square-root of this mean.

\[ \Delta\omega =\sqrt{\dfrac{1}{N}\sum\limits_{i=1}^N \left[\omega_i - \left<\omega\right>\right]^2} \]

This formula can actually be cast into another extremely useful form – so useful that we will end up using this alternative form pretty much exclusively. To get to it requires only a little bit of algebra, resulting in:

\[ \Delta\omega =\sqrt{\left<\omega^2\right>-\left<\omega\right>^2} \]

The compact nature of this new form is apparent. The description is also easy to put into words: "Compute the average of the squares of the outcomes, subtract the square of the average outcome, and take the square root." We will see that this form is especially useful when we go to the case of a continuum of possible outcomes, which we do now...

We already know how to compute a mean using a probability density (Equation 1.3.4). All we have to do to calculate the uncertainty is compute two mean integrals and then plug into Equation 1.3.8.

\[ \left. \begin{array}{l} \left<\omega\right>=\int \limits_{-\infty}^{+\infty}P\left(x\right)\omega\left(x\right)dx \\ \left<\omega^2\right>=\int \limits_{-\infty}^{+\infty}P\left(x\right)\left[\omega\left(x\right)\right]^2dx \end{array} \right\} \;\;\; \Rightarrow\;\;\; \Delta \omega = \sqrt{\left<\omega^2\right>-\left<\omega\right>^2}\]

Example \(\PageIndex{2}\)

*In Example 1.3.1, symmetry demands that the average position of the block is the origin. Find the uncertainty in the block's position.*

**Solution**-
*With an average position of \(\left<x\right>=0\), Equation 1.3.8 tells us that the uncertainty in the position of the block is:**\[\Delta x = \sqrt{\left<x^2\right>} \nonumber \]**Now we can plug into the integral using the density function we found in Example 1.3.1, but that is reinventing the wheel. It's simpler to use what we found in that example:**\[ \left<PE\right> = \frac{1}{4}kx_o^2 \;\;\; \Rightarrow \;\;\; \left<\frac{1}{2}kx^2\right> = \frac{1}{4}kx_o^2 \;\;\; \Rightarrow \;\;\; \left<x^2\right>=\frac{1}{2}x_o^2 \nonumber \]**This gives us the uncertainty of \(x\):**\[\Delta x = \frac{1}{\sqrt{2}} x_o \nonumber \]*