# 7.1 The Density Matrix

### Pure States and Mixed States

Our treatment here more or less follows that of Sakurai, beginning with two imagined Stern-Gerlach experiments. In that experiment, a stream of (non-ionized) silver atoms from an oven is directed through an inhomogeneous vertical magnetic field, and the stream splits into two. The silver atoms have nonzero magnetic moments, and a magnetic moment in an inhomogeneous magnetic field experiences a nonzero force, causing the atom to veer from its straight line path, the magnitude of the deflection being proportional to the component of the atom’s magnetic moment in the vertical (field) direction. The observation of the beam splitting into two, and no more, means that the vertical component of the magnetic moment, and therefore the associated angular momentum, can only have *two* different values. From the basic analysis of rotation operators and the properties of angular momentum that follow, this observation forces us to the conclusion that the total angular momentum of a silver atom is \(\frac{1}{2}\hbar\) . Ordinary orbital angular momenta cannot have half-integer values; this experiment was one of the first indications that the electron has a spin degree of freedom, an angular momentum that cannot be interpreted as orbital angular momentum of constituent parts. The silver atom has 47 electrons, 46 of them have total spin and orbital momenta that separately cancel, the 47^{th} has no orbital angular momentum, and its spin is the entire angular momentum of the atom.

Here we shall use the Stern-Gerlach stream as an example of a large collection of quantum systems (the atoms) to clarify just how to describe such a collection, often called an *ensemble*. To avoid unnecessary complications, we only consider the *spin* degrees of freedom. We begin by examining two different streams:

Suppose experimentalist \(A\) prepares a stream of silver atoms such that each atom is in the spin state \(\psi_A\): \[ |\psi_A\rangle=\frac{1}{\sqrt{2}}(|\uparrow\rangle+|\downarrow\rangle). \tag{7.1.1}\]

Meanwhile, experimentalist \(B\) prepares a stream of silver atoms which is a *mixture*: half the atoms are in state \(|\uparrow\rangle\) and half are in the state \(|\downarrow\rangle\): call this mix \(B\).

*Question*: can we distinguish the \(A\) stream from the \(B\) stream?

Evidently, not by measuring the spin in the z-direction! Both will give up 50% of the time, down 50%.

But: we *can *distinguish them by measuring the spin in the x-direction: the \(\psi_A\) quantum state is in fact just that of a spin in the x-direction, so it will give “up” in the x-direction every time—from now on we call it \(|\uparrow_x\rangle\), whereas the state \(|\uparrow\rangle\) (“up” in the z-direction) will yield “up” in the x-direction only 50% of the time, as will \(|\downarrow\rangle\).

The state \(\psi_A=|\uparrow_x\rangle\) is called a ** pure** state, it’s the kind of quantum state we’ve been studying this whole course.

The stream \(B\), in contrast, is in a ** mixed** state: the kind that actually occurs to a greater or lesser extent in a real life stream of atoms, different pure quantum states occurring with different probabilities, but with no phase coherence between them. In other words, these relative probabilities in \(B\) of different quantum states do

*not*derive from probability amplitudes, as they do in finding the probability of spin up in stream \(A\): the probabilities of the different quantum states in the mixed state \(B\) are exactly like classical probabilities.

That being said, though, to find the probability of measuring spin up in some such mixed state, one *first* uses the classical-type probability for each component state, *then* for each quantum state in the mix, one finds the probability of spin up *in that state* by the standard quantum technique.

Therefore, for a mixed state in which the system is in state \(|\psi_i\rangle\) with probability \(w_i,\; \sum w_i=1\), the expectation value of an operator \(\hat{A}\) is \[ \langle \hat{A}\rangle=\sum w_i\langle \psi_i|\hat{A}|\psi_i\rangle \tag{7.1.2}\]

and we should emphasize that these \(|\psi_i\rangle\) do *not* need to be orthogonal (but they are of course normalized): for example one could be \(|\uparrow_x\rangle\), another \(|\uparrow_z\rangle\). (We put the usually omitted z in for emphasis.) The reason we put a hat on \(\hat{A}\) here is to emphasize that this is an operator, but the \(w_i\) are just numbers.

### The Density Matrix

The equation for the expectation value \(\langle \hat{A}\rangle\) can be written:

\[ \langle \hat{A}\rangle=Trace(\hat{\rho}\hat{A})\;\; where\;\; \hat{\rho}=\sum w_i|\psi_i\rangle\langle \psi_i| . \tag{7.1.3}\]

To see exactly how this comes about, recall that for an operator \(\hat{B}\) in a finite-dimensional vector space with an orthonormal basis set \(|j\rangle\), \(Tr\hat{B}=\sum_{j=1}^{n}\langle j|\hat{B}|j\rangle=B_{jj}\), where the repeated suffix implies summation of the diagonal matrix elements of the operator.

Therefore,

\[ \begin{matrix} Tr(\hat{\rho}\hat{A}) =\sum_{j=1}^{n}\sum_{i=1}^{n}w_i\langle j|\psi_i\rangle\langle \psi_i|\hat{A}|j\rangle \\ =\sum_{j=1}^{n}\sum_{i=1}^{n}w_i\langle \psi_i|\hat{A}|j\rangle\langle j|\psi_i\rangle \\ =\sum_{i=1}^{n}w_i\langle \psi_i|\hat{A}|\psi_i\rangle \end{matrix} \tag{7.1.4}\]

since \(\sum|j\rangle\langle j|=I\), the identity.

This \(\hat{\rho}\) is called the *density matrix*: its matrix form is made explicit by considering states \(|\psi_i\rangle\) in a finite N-dimensional vector space (such as spins or angular momenta)

\[ |\psi_i\rangle=\sum_j(V_i)_j|j\rangle \tag{7.1.5}\]

where the \(|j\rangle\) are an orthonormal basis set, and \((V_i)_j\) is the \(j^{th}\) component of a normalized vector \(V_i\). It is convenient to express \(\hat{\rho}\) in terms of kets and bras belonging to this orthonormal basis,

\[ \hat{\rho}=\sum w_i|\psi_i\rangle\langle \psi_i|=\sum_{i,j,k}w_i(V_i)_j(V^{\dagger}i)_k|j\rangle\langle k|=\sum_{j,k}\rho_{jk}|j\rangle\langle k| \tag{7.1.6}\]

and evidently

\[ \langle \hat{A}\rangle=Trace(\hat{\rho}\hat{A})=\sum_{n,j,k}\langle n|\rho_{jk}|j\rangle\langle k|\hat{A}|n\rangle=\sum_{j,k}\rho_{jk}\langle k|\hat{A}|j\rangle=\sum_{j,k}\rho_{jk}A_{kj}. \tag{7.1.7}\]

(Since \(\rho_{jk}\) is just a *number*, \(\langle n|\rho_{jk}|j\rangle=\rho_{jk}\langle n|j\rangle=\rho_{jk}\delta_{nj}\).)

\(Trace(\hat{\rho}\hat{A})\) is *basis-independent*, the trace of a matrix being unchanged by a unitary transformation, since it follows from \(Tr(ABC)=Tr(BCA)\) that \[ TrU^{\dagger}\, AU=TrAU\, U^{\dagger}=TrA\; for\; UU^{\dagger}=1. \tag{7.1.8}\]

Note that since the vectors \(V_i\) are normalized, \(\sum_j(V_i)_j(V^{\dagger}_i)_j=1\), with the \(i\) not summed over, and \(\sum w_i=1\), it follows that \[ Tr\hat{\rho}=1 \tag{7.1.9}\]

(also evident by putting \(A=1\) in the equation for \(\langle A\rangle\) ).

For a system in a ** pure** quantum state \(|\psi\rangle\), \(\hat{\rho}=|\psi\rangle\langle \psi|\), just the projection operator into that state, and \[ \hat{\rho}^2=\hat{\rho}, \tag{7.1.10}\]

as for all projection operators.

It’s worth spelling out how this differs from the mixed state by looking at the form of the density matrix.

For the pure state \(|\psi\rangle\), if a basis is chosen so that \(|\psi\rangle\) is a member of the basis (this can always be done), \(\hat{\rho}\) is a matrix with every element zero except the one diagonal element corresponding to \(|\psi\rangle\langle \psi|\), which will be unity. Obviously, \(\hat{\rho}^2=\hat{\rho}\). This is less obvious in a general basis, where \(\hat{\rho}\) will not necessarily be diagonal. But the statement \(\hat{\rho}^2=\hat{\rho}\) remains true under a transformation to a new basis.

For a mixed state, let’s say for example a mixture of orthogonal states \(|\psi_1\rangle,\; |\psi_2\rangle\), if we choose a basis including both states, the density matrix will be diagonal with just two entries \(w_1,\; w_2\). Both these numbers must be less than unity, so \(\hat{\rho}^2\neq \hat{\rho}\). A mix of nonorthogonal states is left as an exercise for the reader.

### Some Simple Examples

**First, our case A above (pure state)**: all spins in state \(|\uparrow_x\rangle=(1/\sqrt{2})(|\uparrow\rangle+|\downarrow\rangle)\).

In the standard \(|\uparrow\rangle\), \(|\downarrow\rangle\) basis, \[ \hat{\rho}=|\uparrow_x\rangle\langle \uparrow_x|=\dbinom{1/\sqrt{2}}{1/\sqrt{2}}\begin{pmatrix}1/\sqrt{2}& 1/\sqrt{2} \end{pmatrix}=\begin{pmatrix} 1/2&1/2\\ 1/2&1/2 \end{pmatrix} \tag{7.1.11}\]

and \[ \begin{matrix} \langle s_x\rangle=Tr(\hat{\rho}s_x)=\frac{\hbar}{2}Tr\begin{pmatrix} 1/2&1/2\\ 1/2&1/2 \end{pmatrix}\begin{pmatrix} 0&1\\ 1&0 \end{pmatrix}=\frac{\hbar}{2} \\ \langle s_z\rangle=Tr(\hat{\rho}s_z)=\frac{\hbar}{2}Tr\begin{pmatrix} 1/2&1/2\\ 1/2&1/2 \end{pmatrix}\begin{pmatrix} 1&0\\ 0&-1 \end{pmatrix}=0. \end{matrix} \tag{7.1.12}\]

Notice that \(\hat{\rho}^2=\hat{\rho}\).

**Now, case \(B\) (50-50 mixed up and down)**: 50% in the state \(|\uparrow\rangle\), 50% \(|\downarrow\rangle\).

The density matrix is \[ \begin{matrix} \hat{\rho}=\frac{1}{2}|\uparrow\rangle\langle \uparrow|+\frac{1}{2}|\downarrow\rangle\langle \downarrow| \\ =\frac{1}{2}\dbinom{1}{0}\begin{pmatrix}1& 0 \end{pmatrix}+\frac{1}{2}\dbinom{0}{1}\begin{pmatrix}0& 1 \end{pmatrix}=\frac{1}{2}\begin{pmatrix} 1&0\\ 0&1 \end{pmatrix}. \end{matrix} \tag{7.1.13}\]

This is proportional to the unit matrix, so \[ Tr\hat{\rho}s_x=\frac{1}{2}\frac{\hbar}{2}Tr\sigma_x=0, \tag{7.1.14}\]

and similarly for \(s_y\) and \(s_z\), since the Pauli \(\sigma\) -matrices are all traceless. Note also that \(\hat{\rho}^2=\frac{1}{2}\hat{\rho}\neq \hat{\rho}\), as is true for all mixed states.

**Finally, a 50-50 mixed state relative to the x-axis**:

That is, 50% of the spins in the state \(|\uparrow_x\rangle=(1/\sqrt{2})(|\uparrow\rangle+|\downarrow\rangle)\), “up” along the x- axis, and 50% in \(|\downarrow_x\rangle=(1/\sqrt{2})(|\uparrow\rangle-|\downarrow\rangle)\), “down” in the x-direction.

It is easy to check that \[ \hat{\rho}=\frac{1}{2}|\uparrow_x\rangle\langle \uparrow_x|+\frac{1}{2}|\downarrow_x\rangle\langle \downarrow_x|=\frac{1}{2}\begin{pmatrix} 1/2&1/2\\ 1/2&1/2 \end{pmatrix}+\frac{1}{2}\begin{pmatrix} 1/2&-1/2\\ -1/2&1/2 \end{pmatrix}=\frac{1}{2}\begin{pmatrix} 1&0\\ 0&1 \end{pmatrix}. \tag{7.1.15}\]

This is exactly the same density matrix we found for 50% in the state \(|\uparrow\rangle\), 50% \(|\downarrow\rangle\)!

The reason is that both formulations describe a state about which we know nothing—we are in a state of *total ignorance*, the spins are completely random, all directions are equally likely. The density matrix describing such a state cannot depend on the direction we choose for our axes.

Another two-state quantum system that can be analyzed in the same way is the polarization state of a beam of light, the basis states being polarization in the x-direction and polarization in the y-direction, for a beam traveling parallel to the z- axis. Ordinary unpolarized light corresponds to the random mixed state, with the same density matrix as in the last example above.

### Time Evolution of the Density Matrix

In the mixed state, the quantum states evolve independently according to Schrödinger’s equation, so

\[ i\hbar \frac{d\hat{\rho}}{dt}=\sum w_i H|\psi_i\rangle\langle \psi_i| - \sum w_i|\psi_i\rangle\langle \psi_i|H=[H,\hat{\rho}]. \tag{7.1.16}\]

Note that this has the opposite sign from the evolution of a Heisenberg operator, not surprising since the density operator is made up of Schrödinger bras and kets.

The equation is the quantum analogue of *Liouville’s theorem* in statistical mechanics. Liouville’s theorem describes the evolution in time of an ensemble of identical classical systems, such as many boxes each filled with the same amount of the same gas at the same temperature, but the positions and momenta of the individual atoms are randomly different in each. Each box can be classically described by a single point in a huge dimensional space, a space having six dimensions for each atom (position and momentum, we ignore possible internal degrees of freedom). The whole ensemble, then, is a gas of these points in this huge space, and the rate of change of local density of this gas, from Hamilton’s equations, is \(\partial \rho/\partial t=-\{\rho,H\}\), the bracket now being a Poisson bracket (see my *Classical **Mechanics*notes). Anyway, this is the classical precursor of, and the reason for the name of, the density matrix.

### Thermal Equilibrium

A system in thermal equilibrium is represented in statistical mechanics by a *canonical ensemble*. If the eigenstate \(|i\rangle\) of the Hamiltonian has energy \(E_i\), the relative probability of the system being in that state is \(e^{-E_i/kT}=e^{-\beta E_i}\) in the standard notation. Therefore the density matrix is: \[ \hat{\rho}=\frac{1}{Z}\sum_i e^{-\beta E_i}|i\rangle\langle i|=\frac{e^{-\beta H}}{Z}, \tag{7.1.17}\]

where \[ Z=\sum_i e^{-\beta E_i}=Tre^{-\beta H}. \tag{7.1.18}\]

Notice that in this formulation, apart from the normalization constant \(Z\), the density operator is analogous to the propagator \(U(t)=e^{-iHt/\hbar}\) for an imaginary time \(t=-i\hbar \beta\) . Incidentally, for interacting quantum fields, the propagator can be constructed as a set of Feynman diagrams corresponding to all possible sequences of particle scatterings by interaction. To find the thermodynamic properties of a field theory at finite temperature, essentially the same set of diagrams is used to find the free energy: the diagrams now describe the system propagating for a finite imaginary time, the same mathematical tools can be used.

At zero temperature ( \(\beta =\infty\) ) the probability coefficients \(w_i=e^{-\beta E_i}/Z\) are all zero except for the ground state: the system is in a pure state, and the density matrix has every element zero except for a single element on the diagonal. At infinite temperature, all the \(w_i\) are equal: the density matrix is just \(1/N\) times the unit matrix, where \(N\) is the total number of states available to the system. In fact, the *entropy* of the system can be expressed in terms of the density matrix: \(S=-kTr(\hat{\rho}\ln \hat{\rho})\). This is not as bad as it looks: both operators are diagonal in the energy subspace.

### Contributors

- Michael Fowler (Beams Professor, Department of Physics, University of Virginia)