10.1: Appendix A (Part 1)
The following English translations of excerpts from three papers by Einstein were originally published in “The Principle of Relativity,” Methuen and Co., 1923. The translation was by W. Perrett and G.B. Jeffery, and notes were provided by A. Sommerfeld. John Walker (www. fourmilab.ch) has provided machine-readable versions of the first two and placed them in the public domain. Some notation has been modernized, British spelling has been Americanized, etc. Footnotes by Sommerfeld, Walker, and B. Crowell are marked with initials. B. Crowell’s modifications to the present version are also in the public domain.
- The paper “On the electrodynamics of moving bodies” contains two parts, the first dealing with kinematics and the second with electrodynamics. I’ve given only the first part here, since the second one is lengthy, and painful to read because of the cumbersome old-fashioned notation. The second section can be obtained from John Walker’s web site.
- The paper “Does the inertia of a body depend upon its energy content?,” which begins later , is very short and readable. A shorter and less general version of its main argument is given in section 4.2 .
- “The foundation of the general theory of relativity” is a long review article in which Einstein systematically laid out the general theory, which he had previously published in a series of shorter papers. The first three sections of the paper give the general physical reasoning behind coordinate independence, referred to as general covariance. It begins later .
The reader who is interested in seeing these papers in their entirety can obtain them inexpensively in a Dover reprint of the original Methuen anthology.
"On the Electrodynamics of Moving Bodies"
A. Einstein, Annalen der Physik 17 (1905) 891. It is known that Maxwell’s electrodynamics—as usually understood at the present time— when applied to moving bodies, leads to asymmetries which do not appear to be inherent in the phenomena. 13 Take, for example, the reciprocal electrodynamic action of a magnet and a conductor. The observable phenomenon here depends only on the relative motion of the conductor and the magnet, whereas the customary view draws a sharp distinction between the two cases in which either the one or the other of these bodies is in motion. For if the magnet is in motion and the conductor at rest, there arises in the neighbourhood of the magnet an electric field with a certain definite energy, producing a current at the places where parts of the conductor are situated. But if the magnet is stationary and the conductor in motion, no electric field arises in the neighbourhood of the magnet. In the conductor, however, we find an electromotive force, to which in itself there is no corresponding energy, but which gives rise— assuming equality of relative motion in the two cases discussed—to electric currents of the same path and intensity as those produced by the electric forces in the former case.
Note
Einstein begins by giving an example involving electromagnetic induction, considered in two different frames of reference. With modern hindsight, we would describe this by saying that a Lorentz boost mixes the electric and magnetic fields, as described in section 4.2 . —BC
Examples of this sort, together with the unsuccessful attempts to discover any motion of the earth relative to the “light medium,” suggest that the phenomena of electrodynamics as well as of mechanics possess no properties corresponding to the idea of absolute rest. 14 They suggest rather that, as has already been shown to the first order of small quantities, 15 the same laws of electrodynamics and optics will be valid for all frames of reference for which the equations of mechanics hold good. 16 We will raise this conjecture (the purport of which will hereafter be called the “Principle of Relativity”) to the status of a postulate, and also introduce another postulate, which is only apparently irreconcilable with the former, namely, that light is always propagated in empty space with a definite velocity c which is independent of the state of motion of the emitting body. 17 These two postulates suffice for the attainment of a simple and consistent theory of the electrodynamics of moving bodies based on Maxwell’s theory for stationary bodies. The introduction of a “luminiferous ether” will prove to be superfluous inasmuch as the view here to be developed will not require an “absolutely stationary space” provided with special properties, nor assign a velocity-vector to a point of the empty space in which electromagnetic processes take place.
Notes
14 Einstein knew about the Michelson-Morley experiment by 1905 (J. van Dongen, arxiv.org/abs/0908.1545), but it isn’t cited specifically here. The 1881 and 1887 Michelson-Morley papers are available online at en. wikisource.org. —BC
15 I.e., to first order in \(\frac{v}{c}\). Experimenters as early as Fresnel (1788-1827) had shown that there were no effects of order \(\frac{v}{c}\) due to the earth’s motion through the aether, but they were able to interpret this without jettisoning the aether, by contriving models in which solid substances dragged the aether along with them. The negative result of the Michelson-Morley experiment showed a lack of an effect of order (\(\frac{v}{c}\)) 2 . —BC
16 The preceding memoir by Lorentz was not at this time known to the author. —AS
17 The second postulate is redundant if we take the “laws of electrodynamics and optics” to refer to Maxwell’s equations. Maxwell’s equations require that light move at c in any frame of reference in which they are valid, and the first postulate has already claimed that they are valid in all inertial frames of reference. Einstein probably states constancy of c as a separate postulate because his audience is accustomed to thinking of Maxwell’s equations as a partial mathematical representation of certain aspects of an underlying aether theory. Throughout part I of the paper, Einstein is able to derive all his results without assuming anything from Maxwell’s equations other than the constancy of c. The use of the term “postulate” suggests the construction of a formal axiomatic system like Euclidean geometry, but Einstein’s real intention here is to lay out a set of philosophical criteria for evaluating candidate theories; he freely brings in other, less central, assumptions later in the paper, as when he invokes homogeneity of spacetime later . —BC
18 Essentially what Einstein means here is that you can’t have Maxwell’s equations without establishing position and time coordinates, and you can’t have position and time coordinates without clocks and rulers. Therefore even the description of a purely electromagnetic phenomenon such as a light wave depends on the existence of material objects. He doesn’t spell out exactly what he means by “rigid,” and we now know that relativity doesn’t actually allow the existence of perfectly rigid solids (see section 3.5 ). Essentially he wants to be able to talk about rulers that behave like solids rather than liquids, in the sense that if they are accelerated sufficiently gently from rest and later brought gently back to rest, their properties will be unchanged. When he derives the length contraction later, he wants it to be clear that this isn’t a dynamical phenomenon caused by an effect such as the drag of the aether.—BC
The theory to be developed is based—like all electrodynamics—on the kinematics of the rigid body, since the assertions of any such theory have to do with the relationships between rigid bodies (systems of coordinates), clocks, and electromagnetic processes. 18 Insufficient consideration of this circumstance lies at the root of the difficulties which the electrodynamics of moving bodies at present encounters.
Kinematical Part
§1. Definition of Simultaneity
Let us take a system of coordinates in which the equations of Newtonian mechanics hold good. 19 In order to render our presentation more precise and to distinguish this system of coordinates verbally from others which will be introduced hereafter, we call it the “stationary system.”
Note
i.e., to the first approximation.—AS
If a material point is at rest relative to this system of coordinates, its position can be defined relative thereto by the employment of rigid standards of measurement and the methods of Euclidean geometry, and can be expressed in Cartesian coordinates.
If we wish to describe the motion of a material point, we give the values of its coordinates as functions of the time. Now we must bear carefully in mind that a mathematical description of this kind has no physical meaning unless we are quite clear as to what we understand by “time.” We have to take into account that all our judgments in which time plays a part are always judgments of simultaneous events . If, for instance, I say, “That train arrives here at 7 o’clock,” I mean something like this: “The pointing of the small hand of my watch to 7 and the arrival of the train are simultaneous events.” 20
Note
We shall not here discuss the inexactitude which lurks in the concept of simultaneity of two events at approximately the same place, which can only be removed by an abstraction.—AS
It might appear possible to overcome all the difficulties attending the definition of “time” by substituting “the position of the small hand of my watch” for “time.” And in fact such a definition is satisfactory when we are concerned with defining a time exclusively for the place where the watch is located; but it is no longer satisfactory when we have to connect in time series of events occurring at different places, or—what comes to the same thing—to evaluate the times of events occurring at places remote from the watch.
We might, of course, content ourselves with time values determined by an observer stationed together with the watch at the origin of the coordinates, and coordinating the corresponding positions of the hands with light signals, given out by every event to be timed, and reaching him through empty space. But this coordination has the disadvantage that it is not independent of the standpoint of the observer with the watch or clock, as we know from experience. We arrive at a much more practical determination along the following line of thought.
If at the point A of space there is a clock, an observer at A can determine the time values of events in the immediate proximity of A by finding the positions of the hands which are simultaneous with these events. If there is at the point B of space another clock in all respects resembling the one at A, it is possible for an observer at B to determine the time values of events in the immediate neighbourhood of B. But it is not possible without further assumption to compare, in respect of time, an event at A with an event at B. We have so far defined only an “A time” and a “B time.” We have not defined a common “time” for A and B, for the latter cannot be defined at all unless we establish by definition that the “time” required by light to travel from A to B equals the “time” it requires to travel from B to A. Let a ray of light start at the “A time” t A from A towards B, let it at the “B time” t B be reflected at B in the direction of A, and arrive again at A at the “A time” t' A .
In accordance with definition the two clocks synchronize 21 if
\[t_{B} - t_{A} = t'_{A} - t_{B} \ldotp\]
Note
The procedure described here is known as Einstein synchronization.—BC
We assume that this definition of synchronism is free from contradictions, and possible for any number of points; and that the following relations are universally valid:—
- If the clock at B synchronizes with the clock at A, the clock at A synchronizes with the clock at B.
- If the clock at A synchronizes with the clock at B and also with the clock at C, the clocks at B and C also synchronize with each other. 22
Note
This assumption fails in a rotating frame (see section 3.5 ), but Einstein has restricted himself here to an approximately inertial frame of reference.—BC
Thus with the help of certain imaginary physical experiments we have settled what is to be understood by synchronous stationary clocks located at different places, and have evidently obtained a definition of “simultaneous,” or “synchronous,” and of “time.” The “time” of an event is that which is given simultaneously with the event by a stationary clock located at the place of the event, this clock being synchronous, and indeed synchronous for all time determinations, with a specified stationary clock.
In agreement with experience we further assume the quantity
\[\frac{2AB}{t'_{A} - t_{A}} = c,\]
to be a universal constant—the velocity of light in empty space.
It is essential to have time defined by means of stationary clocks in the stationary system, and the time now defined being appropriate to the stationary system we call it “the time of the stationary system.”
§ 2. On the Relativity of Lengths and Times
The following reflections are based on the principle of relativity and on the principle of the constancy of the velocity of light. These two principles we define as follows:—
- The laws by which the states of physical systems undergo change are not affected, whether these changes of state be referred to the one or the other of two systems of coordinates in uniform translatory motion.
- Any ray of light moves in the “stationary” system of coordinates with the determined velocity c, whether the ray be emitted by a stationary or by a moving body. Hence $$velocity = \frac{light\; path}{time\; interval}$$where time interval is to be taken in the sense of the definition in § 1.
Let there be given a stationary rigid rod; and let its length be l as measured by a measuringrod which is also stationary. We now imagine the axis of the rod lying along the axis of x of the stationary system of coordinates, and that a uniform motion of parallel translation with velocity v along the axis of x in the direction of increasing x is then imparted to the rod. We now inquire as to the length of the moving rod, and imagine its length to be ascertained by the following two operations:—
- The observer moves together with the given measuring-rod and the rod to be measured, and measures the length of the rod directly by superposing the measuring-rod, in just the same way as if all three were at rest.
- By means of stationary clocks set up in the stationary system and synchronizing in accordance with § 1, the observer ascertains at what points of the stationary system the two ends of the rod to be measured are located at a definite time. The distance between these two points, measured by the measuring-rod already employed, which in this case is at rest, is also a length which may be designated “the length of the rod.”
In accordance with the principle of relativity the length to be discovered by the operation (a)—we will call it “the length of the rod in the moving system”—must be equal to the length l of the stationary rod.
The length to be discovered by the operation (b) we will call “the length of the (moving) rod in the stationary system.” This we shall determine on the basis of our two principles, and we shall find that it differs from l.
Current kinematics tacitly assumes that the lengths determined by these two operations are precisely equal, or in other words, that a moving rigid body at the epoch t may in geometrical respects be perfectly represented by the same body at rest in a definite position.
We imagine further that at the two ends A and B of the rod, clocks are placed which synchronize with the clocks of the stationary system, that is to say that their indications correspond at any instant to the “time of the stationary system” at the places where they happen to be. These clocks are therefore “synchronous in the stationary system.”
We imagine further that with each clock there is a moving observer, and that these observers apply to both clocks the criterion established in § 1 for the synchronization of two clocks. Let a ray of light depart from A at the time 23 t A , let it be reflected at B at the time t B , and reach A again at the time t' A . Taking into consideration the principle of the constancy of the velocity of light we find that
\[t_{B} - t_{A} = \frac{r_{AB}}{c - v}\quad and\quad t_{A} - t_{B} = \frac{r_{AB}}{c + v}\]
where r AB denotes the length of the moving rod—measured in the stationary system. Observers moving with the moving rod would thus find that the two clocks were not synchronous, while observers in the stationary system would declare the clocks to be synchronous.
Note
“Time” here denotes “time of the stationary system” and also “position of hands of the moving clock situated at the place under discussion.”—AS
So we see that we cannot attach any absolute signification to the concept of simultaneity, but that two events which, viewed from a system of coordinates, are simultaneous, can no longer be looked upon as simultaneous events when envisaged from a system which is in motion relative to that system.
§ 3. Theory of the Transformation of Coordinates and Times from a Stationary System to Another System in Uniform Motion of Translation Relative to the Former
Let us in “stationary” space take two systems of coordinates, i.e., two systems, each of three rigid material lines, perpendicular to one another, and issuing from a point. Let the axes of X of the two systems coincide, and their axes of Y and Z respectively be parallel. Let each system be provided with a rigid measuring-rod and a number of clocks, and let the two measuring-rods, and likewise all the clocks of the two systems, be in all respects alike.
Now to the origin of one of the two systems (k) let a constant velocity v be imparted in the direction of the increasing x of the other stationary system (K), and let this velocity be communicated to the axes of the coordinates, the relevant measuring-rod, and the clocks. To any time of the stationary system K there then will correspond a definite position of the axes of the moving system, and from reasons of symmetry we are entitled to assume that the motion of k may be such that the axes of the moving system are at the time t (this “t” always denotes a time of the stationary system) parallel to the axes of the stationary system.
We now imagine space to be measured from the stationary system K by means of the stationary measuring-rod, and also from the moving system k by means of the measuring-rod moving with it; and that we thus obtain the coordinates x, y, z, and \(\xi, \eta, \zeta\) respectively. Further, let the time t of the stationary system be determined for all points thereof at which there are clocks by means of light signals in the manner indicated in § 1; similarly let the time τ of the moving system be determined for all points of the moving system at which there are clocks at rest relative to that system by applying the method, given in § 1, of light signals between the points at which the latter clocks are located.
To any system of values x, y, z, t, which completely defines the place and time of an event in the stationary system, there belongs a system of values \(\xi, \eta, \zeta, \tau\), determining that event relative to the system k, and our task is now to find the system of equations connecting these quantities.
In the first place it is clear that the equations must be linear on account of the properties of homogeneity which we attribute to space and time.
If we place x' = x − vt, it is clear that a point at rest in the system k must have a system of values x', y, z, independent of time. We first define τ as a function of x;, y, z, and t. To do this we have to express in equations that τ is nothing else than the summary of the data of clocks at rest in system k, which have been synchronized according to the rule given in § 1.
From the origin of system k let a ray be emitted at the time \(\tau_{0}\) along the X-axis to x', and at the time \(\tau_{1}\) be reflected thence to the origin of the coordinates, arriving there at the time \(\tau_{2}\); we then must have \(\frac{1}{2} (\tau_{0} + \tau_{2}) = \tau_{1}\), or, by inserting the arguments of the function \(\tau\) and applying the principle of the constancy of the velocity of light in the stationary system:—
\[\frac{1}{2} \Bigg[ \tau (0, 0, 0, t) + \tau \left( 0, 0, 0, t + \dfrac{x'}{c - v} + \dfrac{x'}{c + v} \right) \Bigg] = \tau \left( x', 0, 0, t + \dfrac{x'}{c - v} \right) \ldotp\]
Hence, if x' be chosen infinitesimally small,
\[\frac{1}{2} \left(\dfrac{1}{c - v} + \dfrac{1}{c + v} \right) \frac{\partial \tau}{\partial t} = \frac{\partial \tau}{\partial x'} + \frac{1}{c - v} \frac{\partial \tau}{\partial t},\]
or
\[\frac{\partial \tau}{\partial x'} + \frac{v}{c^{2} - v^{2}} \frac{\partial \tau}{\partial t} = 0 \ldotp\]
It is to be noted that instead of the origin of the coordinates we might have chosen any other point for the point of origin of the ray, and the equation just obtained is therefore valid for all values of x', y, z.
An analogous consideration—applied to the axes of Y and Z—it being borne in mind that light is always propagated along these axes, when viewed from the stationary system, with the velocity \(\sqrt{c^{2} − v^{2}}\) gives us
\[\frac{\partial \tau}{\partial y} = 0,\; \frac{\partial \tau}{\partial z} = 0 \ldotp\]
Since \(\tau\) is a linear function, it follows from these equations that
\[\tau = a \left(t - \dfrac{v}{c^{2} - v^{2}} x' \right)\]
where a is a function \(\phi\)(v) at present unknown, and where for brevity it is assumed that at the origin of k, \(\tau\) = 0, when t = 0.
With the help of this result we easily determine the quantities \(\xi, \eta, \zeta\) by expressing in equations that light (as required by the principle of the constancy of the velocity of light, in combination with the principle of relativity) is also propagated with velocity c when measured in the moving system. For a ray of light emitted at the time \(\tau\) = 0 in the direction of the increasing \(\xi\)
\[\xi = c \tau \quad or \quad \xi = ac \left( t - \dfrac{v}{c^{2} - v^{2}} x' \right) \ldotp\]
But the ray moves relative to the initial point of k, when measured in the stationary system, with the velocity c − v, so that
\[\frac{x'}{c - v} = t \ldotp\]
If we insert this value of t in the equation for \(\xi\), we obtain
\[\xi = a \frac{c^{2}}{c^{2} - v^{2}} x' \ldotp\]
In an analogous manner we find, by considering rays moving along the two other axes, that
\[\eta = c \tau = ac \left( t - \dfrac{v}{c^{2} - v^{2}} x' \right)\]
when
\[\frac{y}{\sqrt{c^{2} - v^{2}}} = t, x' = 0 \ldotp\]
Thus
\[\eta = a \frac{c}{\sqrt{c^{2} - v^{2}}} y \quad and \quad \zeta = a \frac{c}{\sqrt{c^{2} - v^{2}}} z \ldotp\]
Substituting for x' its value, we obtain
\[\begin{split} \tau &= \phi (v) \beta \left( t - \dfrac{vx}{c^{2}} \right), \\ \xi &= \phi (v) \beta (x - vt), \\ \eta &= \phi (v) y, \\ \zeta &= \phi (v) z, \end{split}\]
where
\[\beta = \frac{1}{\sqrt{1 - \frac{v^{2}}{c^{2}}}},\]
and \(\phi\) is an as yet unknown function of v. If no assumption whatever be made as to the initial position of the moving system and as to the zero point of \(\tau\), an additive constant is to be placed on the right side of each of these equations.
We now have to prove that any ray of light, measured in the moving system, is propagated with the velocity c, if, as we have assumed, this is the case in the stationary system; for we have not as yet furnished the proof that the principle of the constancy of the velocity of light is compatible with the principle of relativity.
At the time t = \(\tau\) = 0, when the origin of the coordinates is common to the two systems, let a spherical wave be emitted therefrom, and be propagated with the velocity c in system K. If (x, y, z) be a point just attained by this wave, then
\[x^{2} + y^{2} + z^{2} = c^{2} t^{2} \ldotp\]
Transforming this equation with the aid of our equations of transformation we obtain after a simple calculation
\[\xi^{2} + \eta^{2} + \zeta^{2} = c^{2} \tau^{2} \ldotp\]
The wave under consideration is therefore no less a spherical wave with velocity of propagation c when viewed in the moving system. This shows that our two fundamental principles are compatible. 24
Note
The equations of the Lorentz transformation may be more simply deduced directly from the condition that in virtue of those equations the relation x 2 + y 2 + z 2 = c 2 t 2 shall have as its consequence the second relation \(\xi^{2} + \eta^{2} + \zeta^{2} = c^{2} \tau^{2}\).—AS
In the equations of transformation which have been developed there enters an unknown function \(\phi\) of v, which we will now determine.
For this purpose we introduce a third system of coordinates K', which relative to the system k is in a state of parallel translatory motion parallel to the axis of \(\Xi\), 25 such that the origin of coordinates of system K' moves with velocity −v on the axis of \(\Xi\). At the time t = 0 let all three origins coincide, and when t = x = y = z = 0 let the time t' of the system K' be zero. We call the coordinates, measured in the system K', x', y', z', and by a twofold application of our equations of transformation we obtain
\[\begin{split} t' &= \phi (-v) \beta (-v) \left(\tau + \dfrac{v \xi}{c^{2}} \right) &= \phi (v) \phi (-v) t, \\ x' &= \phi (-v) \beta (-v) (\xi + v \tau) &= \phi (v) \phi (-v) x, \\ y' &= \phi (-v) \eta &= \phi (v) \phi (-v) y, \\ z' &= \phi (-v) \zeta &= \phi (v) \phi (-v) z \ldotp \end{split}\]
Note
In Einstein’s original paper, the symbols (\(\Xi\), H, Z) for the coordinates of the moving system k were introduced without explicitly defining them. In the 1923 English translation, (X, Y, Z) were used, creating an ambiguity between X coordinates in the fixed system K and the parallel axis in moving system k. Here and in subsequent references we use \(\Xi\) when referring to the axis of system k along which the system is translating with respect to K. In addition, the reference to system K' later in this sentence was incorrectly given as “k” in the 1923 English translation.—JW
Since the relations between x', y', z' and x, y, z do not contain the time t, the systems K and K' are at rest with respect to one another, and it is clear that the transformation from K to K' must be the identical transformation. Thus
\[\phi (v) \phi (-v) = 1 \ldotp\]
We now inquire into the signification of \(\phi\)(v). We give our attention to that part of the axis of Y of system k which lies between \(\xi\) = 0, \(\eta\) = 0, \(\zeta\) = 0 and \(\xi\) = 0, \(\eta\) = l, \(\zeta\) = 0. This part of the axis of Y is a rod moving perpendicularly to its axis with velocity v relative to system K. Its ends possess in K the coordinates
\[x_{1} = vt, \quad y_{1} = \frac{l}{\phi (v)}, \quad z_{1} = 0\]
and
\[x_{2} = vt, \quad y_{2} = 0, \quad z_{2} = 0 \ldotp\]
The length of the rod measured in K is therefore \(\frac{l}{\phi}\)(v); and this gives us the meaning of the function \(\phi\)(v). From reasons of symmetry it is now evident that the length of a given rod moving perpendicularly to its axis, measured in the stationary system, must depend only on the velocity and not on the direction and the sense of the motion. The length of the moving rod measured in the stationary system does not change, therefore, if v and −v are interchanged. Hence follows that \(\frac{l}{\phi}\)(v) =\(\frac{l}{\phi}\)(−v), or
\[\phi (v) = \phi (-v) \ldotp\]
It follows from this relation and the one previously found that \(\phi\)(v) = 1, so that the transformation equations which have been found become
\[\begin{split} \tau &= \beta \left( t - \dfrac{vx}{c^{2}} \right), \\ \xi &= \beta (x - vt), \\ \eta &= y, \\ \zeta &= z, \end{split}\]
where
\[\beta = \frac{1}{\sqrt{1 - \frac{v^{2}}{c^{2}}}} \ldotp\]
Contributors