$$\require{cancel}$$

# 1.11: Errors and Statistics

Scientists can reduce observational error by using more accurate equipment and by making multiple independent measurements. Let's use the example of measuring paper. Suppose we use a ruler marked off in millimeters to measure the width of a piece of paper. If we do this once, we might get the answer 217 millimeters. How accurate or reliable is this single measurement? We might think that it is accurate to the size of the smallest division on the ruler: a millimeter. But the truth is that we don’t really know.

When using a ruler such as this we can be accurate to the millimeter. Click here for original source URL.

Let us measure the width of a piece of paper ten times, with the following results: 217, 216, 216, 216, 215, 217, 216, 217, 218, and 216 millimeters. The best estimate is given by the average of the measurements

Average = (217+216+216+216+215+217+216+217+218+216) / 10 = 216.4 millimeters

In the case of multiple measurements, we can calculate an error for the combined measurement. The error, also called the standard error or the standard deviation, is the scatter of the individual measurements about the average. The standard error is calculated from the differences between each individual measurement and the average. Mathematically, we square the differences and add them, dividing by the number of measurements (minus one). Finally take the square root of the result

Standard Error = √[(217-216.4)2 + (216-216.4)2 + ...+ (216-216.4)2 / (N-1)] = 0.8 millimeters

In this calculation, the dots (...) are used as a space-saving notation; we must form the sum of the squares of all ten differences. Most calculators will do this manipulation for you automatically, but it is instructive to see it written out. Why do we divide by the number of measurements minus one rather than the number of measurements? The answer is that error cannot be determined for a single measurement. Mathematically, when N = 1 we divide by zero in the equation above and the error is infinitely large. Notice that when the number of measurements is very large, the difference between N and N-1 is not very important.

What do scientists gain by making more than one measurement? We know that our combined estimate is more reliable, because it is not as sensitive to a mistake that we might make with a single measurement (for example, misreading the scale on the ruler). We also know what the true error of our best estimate is, given by the scatter in the individual readings. Previously, all we could do for a single measurement is estimate the error to be the size of the smallest division on the ruler

Single Measurement ± Estimated Error = 217 ± 1 millimeters

The combination of ten measurements gives a more reliable best estimate and a standard error

Average ± Standard Error = 216.4 ± 0.8 millimeters

Example of a histogram. This one showing frequencies of arrivals per minute. Click here for original source URL.

The accuracy of our combined measurement is actually smaller than the division on the rule. Multiple measurements have increased the accuracy. This in turn allows us to quote the result with a higher precision— four significant figures instead of three. If a very large number of measurements were made, a histogram of the measurement would take the form of what is called a normal distribution or more colloquially, a bell curve.

In the example of the measurement of a star's position, we can define the position with greater accuracy using more and more independent measurements. We can never say with complete certainty where the star really is. But we can combine data to yield tighter and tighter estimates of its likely position. Now we are dealing with two dimensions, so the best estimate of the position is the average of the X positions and the average of the Y positions of the individual measurements. The scatter in positions gives the standard error.

Think of a sequence of situations where the number of measurements increases from one to three to ten to thirty. The best estimate of the position is given in each case by the average of the measurements. The average does change as the number of measurements increases. However, it changes less and less as the number of measurements increases because a single outlying point has less and less impact.

It is less clear how we should define the scatter in the measurements. If we choose an error that accounts for all the measurements, this is not a good idea because (unlike the average) it would be very sensitive to a single outlying point. In Gauss's theory, and in the mathematical form given above, the standard error encompasses 2/3 of the measurements. We should remember that the star has a 1/3 probability of being outside the circle defined by the standard error.

Example of a scatter plot of data. With more and more data points, patterns can be found and random error can be lessened. Click here for original source URL.

As the number of measurements increases, the standard error gets smaller and the combined measurement gets more accurate. With many measurements, you minimize the effect of wayward readings and get a more reliable average. The standard error is inversely proportional to the square root of the number of measurements. In mathematical terms, this means

Standard Error ∝ 1/√N

The math symbol ∝ means "proportional to." In both the examples of star position and ruler measurement we see why good science is so hard — reducing the standard error by a factor of three requires nine times as many measurements! We tend to think of science in terms of dramatic discoveries, but most of the progress of science consists of whittling away at errors to match physical models of the world more and more accurately.

Most discussions of the uncertainty in scientific measurements refer to random errors. A more dangerous uncertainty results from systematic errors. These are errors that are due to some flaw in the measuring equipment or to a mistake in the way the measurement is made. Imagine if the ruler you had used was printed with an incorrect millimeter scale. In this case, you would have combined many measurements in the belief that you were achieving an accurate result, when, in fact, all of the measurements were flawed.

Systematic errors are insidious, because there is no simple experiment you can do with your measurements to reveal them. You would have to compare your ruler to someone else’s ruler. Many important examples of systematic errors have occurred in the history of astronomy. In the case of astronomy, where we can only study the universe at a distance and not conduct a lab experiment, systematic errors are very difficult to track down.