Loading [MathJax]/jax/output/HTML-CSS/jax.js
Skip to main content
Library homepage
 

Text Color

Text Size

 

Margin Size

 

Font Type

Enable Dyslexic Font
Physics LibreTexts

4.4: Change of Basis, and Matrix Diagonalization

( \newcommand{\kernel}{\mathrm{null}\,}\)

From the discussion of the last section, it may look that the matrix language is fully similar to, and in many instances more convenient than the general bra-ket formalism. In particular, Eqs. (54)-(55) and (63)-(64) show that any part of any bra-ket expression may be directly mapped on the similar matrix expression, with the only slight inconvenience of using not only columns but also rows (with their elements complex-conjugated), for state vector representation. This invites the question: why do we need the bra-ket language at all? The answer is that the elements of the matrices depend on the particular choice of the basis set, very much like the Cartesian components of a usual geometric vector depend on the particular choice of reference frame orientation (Fig. 4), and very frequently, at problem solution, it is convenient to use two or more different basis sets for the same system. (Just a bit more patience numerous examples will follow soon.)

Screen Shot 2022-01-19 at 5.22.49 PM.png
Fig. 4.4. The transformation of components of a 2D vector at a reference frame’s rotation.

With this motivation, let us explore what happens at the transform from one basis, {u}, to another one, {v} - both full and orthonormal. First of all, let us prove that for each such pair of bases, and an arbitrary numbering of the states of each base, there exists such an operator ˆU that, first,

Unitary operator:

 |νj=ˆU|uj,

and, second,

 ˆUˆU=ˆUˆU=ˆI.

(Due to the last property,16  ˆU is called a unitary operator, and Eq. (75), a unitary transformation.)

A very simple proof of both statements may be achieved by construction. Indeed, let us take

Unitary operator: construction

 ˆUj|νjuj|, - an evident generalization of Eq. (44). Then, using Eq. (38), we obtain  ˆU|uj=j|νjujuj=j|νjδjj=|νj, so that Eq. (75) has been proved. Now, applying Eq. (31) to each term of the sum (77), we get

Conjugate unitary operator  ˆUj|ujνj|, so that ˆUˆU=j,j|vjujujvj|=j,j|vjδijvj|=j|vjvj| But according to the closure relation (44), the last expression is just the identity operator, so that one of Eqs. (76) has been proved. (The proof of the second equality is absolutely similar.) As a by-product of our proof, we have also got another important expression - Eq. (79). It implies, in particular, that while, according to Eq. (75), the operator ˆU performs the transform from the "old" basis uj to the "new" basis vj, its Hermitian adjoint ˆU performs the reciprocal transform: ˆU|vj=j|ujδjj=|uj Now let us see how do the matrix elements of the unitary transform operators look like. Generally, as was discussed above, the operator’s elements may depend on the basis we calculate them in, so let us be specific - at least initially. For example, let us calculate the desired matrix elements Ujj ’ in the "old" basis {u}, using Eq. (77): Uij|in uuj|ˆU|uj=uj(j|vjuj|)uj=uj|j|vjδj=ujvj. Now performing a similar calculation in the "new" basis {v}, we get Uij in vvj|ˆU|vj=vj|(j|vjuj|)|vj=jδjjujvj=ujvj. Surprisingly, the result is the same! This is of course true for the Hermitian conjugate (79) as well: Uij|in u=Uij|in v=vjuj. These expressions may be used, first of all, to rewrite Eq. (75) in a more direct form. Applying the first of Eqs. (41) to any state vj ’ of the "new" basis, and then Eq. (82), we get |vj=j|ujujvj=jUjj|uj. Similarly, the reciprocal transform is |uj=j|vjvjuj=jUij|vj. These formulas are very convenient for applications; we will use them already in this section.

Next, we may use Eqs. (83)-(84) to express the effect of the unitary transform on the expansion coefficients αj of the vectors of an arbitrary state α, defined by Eq. (37). As a reminder, in the "old" basis {u} they are given by Eqs. (40). Similarly, in the "new" basis {v}, αj|in v=vjα. Again inserting the identity operator in its closure form (44) with the internal index j ’, and then using Eqs. (84) and (40), we get αj|in v=vj(j|ujuj|)α=jvjujujα=jUijujα=jUijαj|in u. The reciprocal transform is performed by matrix elements of the operator ˆU : αj|inu=jUjjαj|in v. So, if the transform (75) from the "old" basis {u} to the "new" basis {v} is performed by a unitary operator, the change (88) of state vectors components at this transformation requires its Hermitian conjugate. This fact is similar to the transformation of components of a usual vector at coordinate frame rotation. For example, for a 2D vector whose actual position in space is fixed (Fig. 4): (αxαy)=(cosφsinφsinφcosφ)(αxαy), but the reciprocal transform is performed by a different matrix, which may be obtained from that participating in Eq. (90) by the replacement φφ. This replacement has a clear geometric sense: if the "new" reference frame {x ’, y} is obtained from the "old" frame {x,y} by a counterclockwise rotation by angle φ, the reciprocal transformation requires angle φ. (In this analogy, the unitary property (76) of the unitary transform operators corresponds to the equality of the determinants of both rotation matrices to 1 .)

Due to the analogy between expressions (88) and (89) on one hand, and our old friend Eq. (62) on the other hand, it is tempting to skip indices in these new results by writing |αin v=ˆU|αin u,|αin u=ˆU|αin v.( SYMBOLIC ONLY! ) Since the matrix elements of ˆU and ˆU do not depend on the basis, such language is not too bad and is mnemonically useful. However, since in the bra-ket formalism (or at least its version presented in this course), the state vectors are basis-independent, Eq. (91) has to be treated as a symbolic one, and should not be confused with the strict Eqs. (88)-(89), and with the rigorous basis-independent vector and operator equalities discussed in Sec. 2 .

Now let us use the same trick of identity operator insertion, repeated twice, to find the transformation rule for matrix elements of an arbitrary operator: Ajj|in vvj|ˆA|vj=vj|(k|ukuk|)ˆA(k|ukuk|)|vj=k,kUjkAkk|in uUkj Ajj|in uk,kUjkAkk|in νUkj In the spirit of Eq. (91), we may represent these results symbolically as well, in a compact form: ˆA|in v=ˆUˆA|in uˆU,ˆA|in u=ˆUˆA|in vˆU. (SYMBOLIC ONLY!)  As a sanity check, let us apply Eq. (93) to the identity operator: ˆI|in v=(ˆUˆIˆU)in u=(ˆUˆU)in u=ˆI|in u

  • as it should be. One more (strict rather than symbolic) invariant of the basis change is the trace of any operator, defined as the sum of the diagonal terms of its matrix:

TrˆATrAjAjj. The (easy) proof of this fact, using previous relations, is left for the reader’s exercise.

So far, I have implied that both state bases {u} and {v} are known, and the natural question is where does this information come from in quantum mechanics of actual physical systems. To get a partial answer to this question, let us return to Eq. (68), which defines the eigenstates and the eigenvalues of an operator. Let us assume that the eigenstates aj of a certain operator ˆA form a full and orthonormal set, and calculate the matrix elements of the operator in the basis {a} of these states, at their arbitrary numbering. For that, it is sufficient to inner-multiply both sides of Eq. (68), written for some index j ’, by the bra-vector of an arbitrary state aj of the same set: aj|ˆA|aj=aj|Aj|aj. The left-hand side of this equality is the matrix element Ajj ’ we are looking for, while its right-hand side is just Ajδij ’. As a result, we see that the matrix is diagonal, with the diagonal consisting of the operator’s eigenvalues: Aij=Ajδjj In particular, in the eigenstate basis (but not necessarily in an arbitrary basis!), Ajj means the same as Aj. Thus the important problem of finding the eigenvalues and eigenstates of an operator is equivalent to the diagonalization of its matrix, 17 i.e. finding the basis in which the operator’s matrix acquires the diagonal form (98); then the diagonal elements are the eigenvalues, and the basis itself is the desirable set of eigenstates.

To see how this is done in practice, let us inner-multiply Eq. (68) by a bra-vector of the basis (say, {u} ) in that we have happened to know the matrix elements Ajj ’: uk|ˆA|aj=uk|Aj|aj. On the left-hand side, we can (as usual :-) insert the identity operator between the operator ˆA and the ket-vector, and then use the closure relation (44) in the same basis {u}, while on the right-hand side, we can move the eigenvalue Aj (a c-number) out of the bracket, and then insert a summation over the same index as in the closure, compensating it with the proper Kronecker delta symbol: uk|ˆAk|ukukaj=Ajkukajδkk. Moving out the signs of summation over k, and using the definition (47) of the matrix elements, we get

k(AkkAjδkk)ukaj=0. But the set of such equalities, for all N possible values of the index k, is just a system of linear, homogeneous equations for unknown c-numbers ukaj. According to Eqs. (82)-(84), these numbers are nothing else than the matrix elements Ukj of a unitary matrix providing the required transformation from the initial basis {u} to the basis {a} that diagonalizes the matrix A. This system may be represented in the matrix form: (A11AjA12A21A22Aj)(U1jU2j)=0 and the condition of its consistency,

 Characteristic  equation  for  eigenvalues |A11AjA12A21A22Aj|=0,

plays the role of the characteristic equation of the system. This equation has N roots Aj the eigenvalues of the operator ˆA; after they have been calculated, plugging any of them back into the system (102), we can use it to find N matrix elements Ukj(k=1,2,N) corresponding to this particular eigenvalue. However, since the equations (103) are homogeneous, they allow finding Ukj only to a constant multiplier. To ensure their normalization, i.e. enforce the unitary character of the matrix U, we may use the condition that all eigenvectors are normalized (just as the basis vectors are): ajajkajukukajk|Ukj|2=1, for each j. This normalization completes the diagonalization. 18

Now (at last!) I can give the reader some examples. As a simple but very important case, let us diagonalize each of the operators described (in a certain two-function basis {u}, i.e. in two-dimensional Hilbert space) by the so-called Pauli matrices σx(0110),σy(0ii0),σz(1001). Though introduced by a physicist, with a specific purpose to describe electron’s spin, these matrices have a general mathematical significance, because together with the 2×2 identity matrix, they provide a full, linearly-independent system - meaning that an arbitrary 2×2 matrix may be represented as (A11A12A21A22)=bI+cxσx+cyσy+czσz, with a unique set of four c-number coefficients b,cx,cy, and cz.

Since the matrix σz is already diagonal, with the evident eigenvalues ±1, let us start with diagonalizing the matrix σx. For it, the characteristic equation (103) is evidently |Aj11Aj|=0, i.e. A2j1=0, and has two roots, A1,2=±1. (Again, the state numbering is arbitrary!) So the eigenvalues of the matrix σx are the same as of the matrix σz. (The reader may readily check that the eigenvalues of the matrix σy are also the same.) However, the eigenvectors of the operators corresponding to these three matrices are different. To find them for σx, let us plug its first eigenvalue, A1=+1, back into equations (101) spelled out for this particular case (j=1;k,k=1,2) : u1a1+u2a1=0,u1a1u2a1=0. These two equations are compatible (of course, because the used eigenvalue A1=+1 satisfies the characteristic equation), and any of them gives u1a1=u2a1, i.e. U11=U21 With that, the normalization condition (104) yields |U11|2=|U21|2=12. Although the normalization is insensitive to the simultaneous multiplication of U11 and U21 by the same phase factor exp{iφ} with any real φ, it is convenient to keep the coefficients real, for example taking φ =0, to get U11=U21=12. Performing an absolutely similar calculation for the second characteristic value, A2=1, we get U12=U22, and we may choose the common phase to have U12=U22=12, so that the whole unitary matrix for diagonalization of the operator corresponding to σx is 19 Ux=Ux=12(1111), For what follows, it will be convenient to have this result expressed in the ket-relation form - see Eqs. (85)-(86): |a1=U11|u1+U21|u2=12(|u1+|u2),|a2=U12|u1+U22|u2=12(|u1|u2), |u1=U11|a1+U21|a2=12(|a1+|a2),|u2=U12|a1+U22|a2=12(|a1|a2). Now let me show that these results are already sufficient to understand the Stern-Gerlach experiments described in Sec. 1 - but with two additional postulates. The first of them is that the interaction of a particle with the external magnetic field, besides that due to its orbital motion, may be described by the following vector operator of its spin dipole magnetic moment: 20 ˆm=ˆS, where the constant coefficient γ, specific for every particle type, is called the gyromagnetic ratio, 21 and ˆS is the vector operator of spin, with three Cartesian components: ˆS=nxˆSx+nyˆSy+nzˆSz Here nx,y,z are the usual Cartesian unit vectors in the 3D geometric space (in the quantum-mechanics sense, just c-numbers, or rather " c-vectors"), while ˆSx,y,z are the "usual" (scalar) operators.

For the so-called spin1/2 particles (including the electron), these components may be simply, as ˆSx,y,z=2ˆσx,y,z Spin-1/2
operator expressed via those of the Pauli vector operator ˆσnxˆσx+nyˆσy+nzˆσz, so that we may also write ˆS=2ˆσ In turn, in the so-called z-basis, each Cartesian component of the latter operator is just the corresponding Pauli matrix (105), so that it may be also convenient to use the following 3D vector of these matrices: σnxσx+nyσy+nzσz(nznxinynx+inynz) The z-basis, in which such matrix representation of ˆσ is valid, is defined as an orthonormal basis of certain two states, commonly denoted an , in that the matrix of the operator ˆσz is diagonal, with eigenvalues, respectively, +1 and 1, and hence the matrix Sz(/2)σz of ˆSz is also diagonal, with the eigenvalues +/2 and /2. Note that we do not "understand" what exactly the states and are, 22 but loosely associate them with some internal rotation of a spin- 1/2 particle about the z-axis, with either positive or negative angular momentum component Sz. However, attempts to use such classical interpretation for quantitative predictions runs into fundamental difficulties - see Sec. 6 below.

The second necessary postulate describes the general relation between the bra-ket formalism and experiment. Namely, in quantum mechanics, each real observable A is represented by a Hermitian operator ˆA=ˆA, and the result of its measurement, 23 in a quantum state α described by a linear superposition of the eigenstates aj of the operator, |α=jαj|aj, with αj=ajα, may be only one of the corresponding eigenvalues Aj.24 Specifically, if the ket (118) and all eigenkets |aj are normalized to 1 , αα=1,ajaj=1, then the probability of a certain measurement outcome Aj is 25 Wj=|αj|2αjαjαajajα, This relation is evidently a generalization of Eq. (1.22) in wave mechanics. As a sanity check, let us assume that the set of the eigenstates aj is full, and calculate the sum of the probabilities to find the system in one of these states: jWj=jαajajα=α|ˆI|α=1. Now returning to the Stern-Gerlach experiment, conceptually the description of the first (z oriented) experiment shown in Fig. 1 is the hardest for us, because the statistical ensemble describing the unpolarized electron beam at its input is mixed ("incoherent"), and cannot be described by a pure ("coherent") superposition of the type (6) that have been the subject of our studies so far. (We will discuss such mixed ensembles in Chapter 7.) However, it is intuitively clear that its results are compatible with the description of the two output beams as sets of electrons in the pure states and , respectively. The absorber following that first stage (Fig. 2) just takes all spin-down electrons out of the picture, producing an output beam of polarized electrons in the definite state. For such a beam, the probabilities (120) are W=1 and W=0. This is certainly compatible with the result of the "control" experiment shown on the bottom panel of Fig. 2: the repeated SG (z) stage does not split such a beam, keeping the probabilities the same.

Now let us discuss the double Stern-Gerlach experiment shown on the top panel of Fig. 2. For that, let us represent the z-polarized beam in another basis - of the two states (I will denote them as and ) in that, by definition, the matrix Sx is diagonal. But this is exactly the set we called a1,2 in the σx matrix diagonalization problem solved above. On the other hand, the states and are exactly what we called u1,2 in that problem, because in this basis, we know matrix σ explicitly - see Eq. (117). Hence, in the application to the electron spin problem, we may rewrite Eqs. (114) as |=12(|+|),|=12(||),|=12(|+|),|=12(||), Currently for us the first of Eqs. (123) is most important, because it shows that the quantum state of electrons entering the SG (x) stage may be represented as a coherent superposition of electrons with Sx=+/2 and Sx=/2. Notice that the beams have equal probability amplitude moduli, so that according to Eq. (122), the split beams and have equal intensities, in accordance with experimental results. (The minus sign before the second ket-vector is of no consequence here, but it may have an impact on outcomes of other experiments - for example, if coherently split beams are brought together again.)

Now, let us discuss the most mysterious (from the classical point of view) multi-stage SG experiment shown on the middle panel of Fig. 2. After the second absorber has taken out all electrons in, say, the state, the remaining electrons, all in the state , are passed to the final, SG (z), stage. But according to the first of Eqs. (122), this state may be represented as a (coherent) linear superposition of the and states, with equal probability amplitudes. The final stage separates electrons in these two states into separate beams, with equal probabilities W=W=1/2 to find an electron in each of them, thus explaining the experimental results.

To conclude our discussion of the multistage Stern-Gerlach experiment, let me note that though it cannot be explained in terms of wave mechanics (which operates with scalar de Broglie waves), it has an analogy in classical theories of vector fields, such as the classical electrodynamics. Indeed, let a plane electromagnetic wave propagate normally to the plane of the drawing in Fig. 5, and pass through the linear polarizer 1.

Screen Shot 2022-01-19 at 5.38.28 PM.pngFig. 4.5. A light polarization sequence similar to the three-stage Stern-Gerlach experiment shown on the middle panel of Fig. 2.

Similarly to the output of the initial SG (z) stages (including the absorbers) shown in Fig. 2, the output wave is linearly polarized in one direction - the vertical direction in Fig. 5. Now its electric field vector has no horizontal component as may be revealed by the wave’s full absorption in a perpendicular polarizer 3 . However, let us pass the wave through polarizer 2 first. In this case, the output wave does acquire a horizontal component, as can be, again, revealed by passing it through polarizer 3 . If the angles between the polarization directions 1 and 2 , and between 2 and 3 , are both equal to π/4, each polarizer reduces the wave amplitude by a factor of 2, and hence the intensity by a factor of 2, exactly like in the multistage SG experiment, with the polarizer 2 playing the role of the SG (x) stage. The "only" difference is that the necessary angle is π/4, rather than by π/2 for the SternGerlach experiment. In quantum electrodynamics (see Chapter 9 below), which confirms classical predictions for this experiment, this difference may be interpreted by that between the integer spin of electromagnetic field quanta (photons) and the half-integer spin of electrons.


16 An alternative way to express Eq. (76) is to write ˆU=ˆU1, but I will try to avoid this language.

17 Note that the expression "matrix diagonalization" is a very common but dangerous jargon. (Formally, a matrix is just a matrix, an ordered set of c-numbers, and cannot be "diagonalized".) It is OK to use this jargon if you remember clearly what it actually means - see the definition above.

18 A possible slight complication here is that the characteristic equation may give equal eigenvalues for certain groups of different eigenvectors. In such cases, the requirement of the mutual orthogonality of these degenerate states should be additionally enforced.

19 Note that though this particular unitary matrix is Hermitian, this is not true for an arbitrary choice of phases φ.

20 This was the key point in the electron spin’s description, developed by W. Pauli in 1925-1927.

21 For the electron, with its negative charge q=e, the gyromagnetic ratio is negative: γe=gee/2me, where ge2 is the dimensionless g-factor. Due to quantum-electrodynamic (relativistic) effects, this g-factor is slightly higher than 2:ge=2(1+α/2π+)2.002319304, where αe2/4πεoc(EH/mec2)1/21/137 is the so-called fine structure constant. (The origin of its name will be clear from the discussion in Sec. 6.3.)

22 If you think about it, the word "understand" typically means that we can express a new, more complex notion in terms of those discussed earlier and considered "known". In our current case, we cannot describe the spin states by some wavefunction ψ(r), or any other mathematical notion discussed in the previous three chapters. The braket formalism has been invented exactly to enable mathematical analyses of such "new" quantum states we do not initially "understand". Gradually we get accustomed to these notions, and eventually, as we know more and more about their properties, start treating them as "known" ones.

23 Here again, just like in Sec. 1.2, the statement implies the abstract notion of "ideal experiments", deferring the discussion of real (physical) measurements until Chapter 10 .

24 As a reminder, at the end of Sec. 3 we have already proved that such eigenstates corresponding to different values Aj are orthogonal. If any of these values is degenerate, i.e. corresponds to several different eigenstates, they should be also selected orthogonal, in order for Eq. (118) to be valid.

25 This key relation, in particular, explains the most common term for the (generally, complex) coefficients αj, the probability amplitudes.


This page titled 4.4: Change of Basis, and Matrix Diagonalization is shared under a CC BY-NC-SA 4.0 license and was authored, remixed, and/or curated by Konstantin K. Likharev via source content that was edited to the style and standards of the LibreTexts platform.

Support Center

How can we help?