Skip to main content
Physics LibreTexts

1.13 Ways of Representing Data

Data is the information of science. From the earliest days of science, language and mathematics have been used to convey information. In the 2nd century A.D., Ptolemy wrote his masterwork on astronomy, called al-Magiste, or the Almagest. This encyclopedia contained a description of the geocentric cosmology and a list of over 1000 star positions, making it the first important collection of astronomical data.

By the time of the Renaissance, accurate drawing and painting had become a legitimate way to represent scientific information. Leonardo da Vinci was not only an artist of extraordinary power and sensitivity; he was also an accomplished scientist and engineer. Da Vinci made many drawings of the features and phases of the Moon, and of the trajectories of moving objects. His drawings of cadavers and the human form were so accurate that they formed the basis for modern anatomy. Galileo used pen and ink to record what he saw through the newly invented telescope. Galileo's watercolors of craters and mountains on the Moon helped convince people that the Moon was a world in space, just like the Earth. Two hundred years later, Charles Darwin used his meticulous drawings of wildlife to support the theory of evolution. Art has never stopped being a tool of science. You are probably familiar with the pictures by artists who work with paleontologists to recreate the look of dinosaurs and other long-extinct species. You may also encounter fine examples of space art — realistic depictions of imagined worlds.

Scientists have developed ways of extending the visual sense. Four hundred years ago, Dutch craftsmen were the first people to make accurate lenses. As a result, they invented a device for magnifying nearby objects — the microscope — and a device for gathering light from distant objects — the telescope. Scientists also developed ways to render the invisible visible. Michael Faraday used the patterns of iron filings to trace magnetic lines of force. E.E Chaldni used chalk dust in a similar way, to trace the patterns of sound waves coming from a metal plate when he caused it to vibrate. Scientists 200 years ago were able to record waves from the beyond the blue and the red ends of the visible spectrum. Since then, we have learned how to record and transmit electromagnetic waves from radio waves to X-rays.

Sensors that measure physical quantities and transmit the data to us are a part of our everyday lives. For example, the dashboard of your car might show measurements of speed, distance, engine speed and temperature, water temperature, oil pressure, battery electric charge, and frequency of the waves on your radio receiver. Scientists have become accustomed to recording such an array of data for everything from the tiny world of the atom to the gigantic scales of a distant galaxy.

A specially developed CCD used for?ultraviolet?imaging in a wire-bonded package. Click here for original source URL.

Since the invention of the telescope, astronomers have benefited from two revolutions in the representation of data. The first came with the development of photography in the mid-19th century. Before photography, astronomers had to make drawings of what they saw. Photographs create a permanent and indisputable record. The second revolution was the development of electronic detectors in the mid-1970s. Charged-coupled devices, or CCDs, allow astronomers to convert light into electrons on a tiny wafer of silicon. The best of these devices have hundreds of millions of picture elements, or pixels. While a photograph can still store more information than a CCD, electronic detectors have the enormous advantage of storing their information in a digital form. Also, CCDs are 30-40 times more efficient at capturing light.

Modern astronomy is based on digital information. This means that information content can be represented as an array of bits, or on-off signals. The information content of a signal depends on the size of the signal and how finely the signal is sub-divided. The signal can be any kind of measurement: the strength of a magnetic field, or velocity of a star, or the intensity in a single pixel of a CCD image. A satellite might record an image of one of the moons of Jupiter and send it back to Earth as a stream of bits. Or an astronomer might record an image of a galaxy through a large telescope and store it on a computer hard disk, or send it over the Internet to a colleague. The information revolution has made it easy to store, manipulate, and transmit enormous amounts of information — a high quality CCD image might contain over 1 billion bits. An all-sky CCD survey is currently underway that will produce over 1017 bits of information, or 30 trillion bytes per night!

In astronomy books and magazines you will see color images that have been made with digital detectors. CCDs do not sense color in the same way as your eye -- which has chemicals that are sensitive to the different colors of light — or your TV screen — which uses beams of electrons to excite red, green, and blue dots of phosphor. Rather, an astronomer uses a filter in front of the CCD camera to isolate a small range of wavelengths. A single image can only be used to represent black-and-white, with the shades of gray indicating the number of bits of information. Three different exposures through red, green, and blue filters can then be combined to produce a true color image of the night sky.

There are two kinds of false color in scientific images. The first is when a single measured intensity is used to represent color. For example, low intensity might be coded blue and high intensity red. A false color image may be more appealing than a black-and-white image, but it conveys no spectral information. The second type of false color occurs when we are representing a quantity that is not based on visible light. If you see a color image of an X-ray, or a color image made with a radio telescope, then the colors have no conventional scientific meaning. A lot of the information content of astronomy is in the form of images. An image is just a map of intensity in two dimensions. The other main form of presenting information is a spectrum, which is a plot of intensity as a function of wavelength.

This graph shows a linear correlation in the form of y = ax + b. Click here for original source URL.

Many types of data that cannot be represented as images or spectra. The simplest way for scientists to present data is in a table. A list of percentages or proportions can also be shown graphically as a pie chart. This form of representation is more commonly used in business and social science than it is in physical science. Scientists often use a histogram to represent a list of numbers. Histograms are useful for showing the relative size of different numbers, but it can be difficult to see the information when the values cover a large range. One solution to this problem is to make a histogram from the logarithm of the quantity, which gives a scale that is highly compressed. It is important to remember that equal intervals on a logarithmic plot correspond to equal ratios whereas equal intervals on a normal histogram correspond to equal differences.

With more than one quantity, the best graphical representation is a plot, or a graph. One quantity is plotted on the horizontal axis (or x-axis) and a second quantity is plotted on the vertical axis (or y-axis). Scientists often make a graph because they anticipate a relationship between two quantities and they want to know the form of the relationship. Other times, they make a plot without knowing what they will find, as a way of "exploring" their data.

When two quantities are related, we say they are correlated. When there is no relationship between the two quantities, we say they are uncorrelated. The simplest relationship is a linear correlation. On a graph, a linear correlation has the algebraic form of the equation for a straight line: y = ax + b, where a is the slope of the line and b is the y-intercept. Thus, y = 2x and y = 0.658x are examples of linear correlation. The general form is y ? x. In general, all data should be shown with an associated error bar attached. A correlation is often more complicated than a linear relationship. Thus, y = 0.112x2 and y = 1.47x0.44 are examples of nonlinear correlation. Kepler's third law relates the period of a planet's orbit (P) to its distance from the Sun (a) with a form P2 ∝ a3. The correlation represented by Kepler's third law has a varying slope, since the relationship between period and distance is a power law, P ∝ a3/2. Alternatively, we can plot Kepler's law on a logarithmic scale. Taking the logarithm of Kepler's law, we get log P = 3/2 log a, which is just the form of a straight line where y = log P, x = log a, and the slope is 3/2. Any power law is therefore a straight line in a logarithmic plot.

Scientists usually only have to consider correlations between two quantities, but nature can be complex. Scientists often extend their analysis to more than two quantities, or multiple variables. For example, cancer is difficult to understand because it is controlled by a complex mixture of environmental and genetic factors. In astronomy, the size of a star depends both on the mass and the chemical composition. Occasionally, a scientist discovers a new type of star or a new fundamental particle. More often, scientists make progress by sifting through piles of data looking for correlations. The discovery of a significant and unanticipated pattern is one of the thrills of doing science.