6.1: Basic Facts about Eigenvalue Problems

Last updated

Apr 30, 2021
Save as PDF
- 6: Eigenvalue Problems
- 6.2: Numerical Eigensolvers

$\newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} }$

$\newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}}$

$\newcommand{\id}{\mathrm{id}}$ $\newcommand{\Span}{\mathrm{span}}$

( \newcommand{\kernel}{\mathrm{null}\,}\) $\newcommand{\range}{\mathrm{range}\,}$

$\newcommand{\RealPart}{\mathrm{Re}}$ $\newcommand{\ImaginaryPart}{\mathrm{Im}}$

$\newcommand{\Argument}{\mathrm{Arg}}$ $\newcommand{\norm}[1]{\| #1 \|}$

$\newcommand{\inner}[2]{\langle #1, #2 \rangle}$

$\newcommand{\Span}{\mathrm{span}}$

$\newcommand{\id}{\mathrm{id}}$

$\newcommand{\Span}{\mathrm{span}}$

$\newcommand{\kernel}{\mathrm{null}\,}$

$\newcommand{\range}{\mathrm{range}\,}$

$\newcommand{\RealPart}{\mathrm{Re}}$

$\newcommand{\ImaginaryPart}{\mathrm{Im}}$

$\newcommand{\Argument}{\mathrm{Arg}}$

$\newcommand{\norm}[1]{\| #1 \|}$

$\newcommand{\inner}[2]{\langle #1, #2 \rangle}$

$\newcommand{\Span}{\mathrm{span}}$ $\newcommand{\AA}{\unicode[.8,0]{x212B}}$

$\newcommand{\vectorA}[1]{\vec{#1}} % arrow$

$\newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow$

$\newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} }$

$\newcommand{\vectorC}[1]{\textbf{#1}}$

$\newcommand{\vectorD}[1]{\overrightarrow{#1}}$

$\newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}}$

$\newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}}$

$\newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} }$

$\newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}}$

$\newcommand{\avec}{\mathbf a}$

$\newcommand{\bvec}{\mathbf b}$

$\newcommand{\cvec}{\mathbf c}$

$\newcommand{\dvec}{\mathbf d}$

$\newcommand{\dtil}{\widetilde{\mathbf d}}$

$\newcommand{\evec}{\mathbf e}$

$\newcommand{\fvec}{\mathbf f}$

$\newcommand{\nvec}{\mathbf n}$

$\newcommand{\pvec}{\mathbf p}$

$\newcommand{\qvec}{\mathbf q}$

$\newcommand{\svec}{\mathbf s}$

$\newcommand{\tvec}{\mathbf t}$

$\newcommand{\uvec}{\mathbf u}$

$\newcommand{\vvec}{\mathbf v}$

$\newcommand{\wvec}{\mathbf w}$

$\newcommand{\xvec}{\mathbf x}$

$\newcommand{\yvec}{\mathbf y}$

$\newcommand{\zvec}{\mathbf z}$

$\newcommand{\rvec}{\mathbf r}$

$\newcommand{\mvec}{\mathbf m}$

$\newcommand{\zerovec}{\mathbf 0}$

$\newcommand{\onevec}{\mathbf 1}$

$\newcommand{\real}{\mathbb R}$

$\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}$

$\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}$

$\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}$

$\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}$

$\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}$

$\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}$

$\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}$

$\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}$

$\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}$

$\newcommand{\laspan}[1]{\text{Span}\{#1\}}$

$\newcommand{\bcal}{\cal B}$

$\newcommand{\ccal}{\cal C}$

$\newcommand{\scal}{\cal S}$

$\newcommand{\wcal}{\cal W}$

$\newcommand{\ecal}{\cal E}$

$\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}$

$\newcommand{\gray}[1]{\color{gray}{#1}}$

$\newcommand{\lgray}[1]{\color{lightgray}{#1}}$

$\newcommand{\rank}{\operatorname{rank}}$

$\newcommand{\row}{\text{Row}}$

$\newcommand{\col}{\text{Col}}$

$\renewcommand{\row}{\text{Row}}$

$\newcommand{\nul}{\text{Nul}}$

$\newcommand{\var}{\text{Var}}$

$\newcommand{\corr}{\text{corr}}$

$\newcommand{\len}[1]{\left|#1\right|}$

$\newcommand{\bbar}{\overline{\bvec}}$

$\newcommand{\bhat}{\widehat{\bvec}}$

$\newcommand{\bperp}{\bvec^\perp}$

$\newcommand{\xhat}{\widehat{\xvec}}$

$\newcommand{\vhat}{\widehat{\vvec}}$

$\newcommand{\uhat}{\widehat{\uvec}}$

$\newcommand{\what}{\widehat{\wvec}}$

$\newcommand{\Sighat}{\widehat{\Sigma}}$

$\newcommand{\lt}{<}$

$\newcommand{\gt}{>}$

$\newcommand{\amp}{&}$

$\definecolor{fillinmathshade}{gray}{0.9}$

Even if a matrix $\mathbf{A}$ is real, its eigenvectors and eigenvalues can be complex. For example,

$\begin{bmatrix}1&1\\-1&1\end{bmatrix} \begin{bmatrix}1\\i\end{bmatrix} = (1+i) \begin{bmatrix}1\\i\end{bmatrix}.$

Eigenvectors are not uniquely defined. Given an eigenvector $\vec{x}$ , any nonzero complex multiple of that vector is also an eigenvector of the same matrix, with the same eigenvalue. We can reduce this ambiguity by normalizing eigenvectors to a fixed unit length:

$\sum_{n=0}^{N-1} |x_n|^2 = 1.$

Note, however, that even after normalization, there is still an inherent ambiguity in the overall complex phase. Multiplying a normalized eigenvector by any phase factor $e^{i\phi}$ gives another normalized eigenvector with the same eigenvalue.

6.1.1 Matrix Diagonalization

Most matrices are diagonalizable, meaning that their eigenvectors span the $N$ -dimensional complex space (where $N$ is the matrix size). Matrices which are not diagonalizable are called defective. Many classes of matrices that are relevant to physics (such as Hermitian matrices) are always diagonalizable; i.e., never defective.

The reason for the term "diagonalizable" is as follows. A diagonalizable $N\times N$ matrix $\mathbf{A}$ has eigenvectors that span the $N$ -dimensional space, meaning that we can choose $N$ linearly independent eigenvectors, $\{\vec{x}_0, \vec{x}_1, \cdots \vec{x}_{N-1}\}$ , with eigenvalues $\{\lambda_0, \lambda_1, \cdots \lambda_{N-1}\}$ . We refer to such a set of $N$ eigenvalues as the "eigenvalues of $\mathbf{A}$ ". If we group the eigenvectors into an $N\times N$ matrix

$\mathbf{Q} = [\vec{x}_0, \vec{x}_1, \cdots \vec{x}_{N-1}],$

then, since the eigenvectors are linearly independent, $\mathbf{Q}$ is guaranteed to be invertible. Using the eigenvalue equation, we can then show that

$\mathbf{Q}^{-1} \,\mathbf{A} \, \mathbf{Q} = \begin{bmatrix}\lambda_0 & 0& \cdots & 0 \\ 0 & \lambda_1 & \cdots & 0 \\ \vdots& \vdots & \ddots & \vdots \\ 0&0&\cdots&\lambda_{N-1}\end{bmatrix}.$

In other words, there exists a similarity transformation which converts $\mathbf{A}$ into a diagonal matrix. The $N$ numbers along the diagonal are precisely the eigenvalues of $\mathbf{A}$ .

6.1.2 The Characteristic Polynomial

One of the most important consequences of diagonalizability is that the determinant of a diagonalizable matrix $\mathbf{A}$ is the product of its eigenvalues:

$\det(\mathrm{A}) = \prod_{n=0}^{N-1} \lambda_n$

This can be proven by taking the determinant of the similarity transformation equation, and using (i) the property of the determinant that $\det(\mathbf{U}\mathbf{V}) = \det(\mathbf{U})\det(\mathbf{V})$ , and (ii) the fact that the determinant of a diagonal matrix is the product of the elements along the diagonal.

In particular, the determinant of $\mathbf{A}$ is zero if one of its eigenvalues is zero. This fact can be further applied to the following re-arrangement of the eigenvalue equation:

$\Big(\mathbf{A} - \lambda\mathbf{I}\Big) \, \vec{x} = 0,$

where $\mathbf{I}$ is the $N\times N$ identity matrix. This says that the matrix $\mathbf{A}-\lambda\mathbf{I}$ has an eigenvalue of zero, meaning that for any eigenvalue $\lambda$ ,

$\det\left(\mathbf{A} - \lambda\mathbf{I}\right) = 0.$

The left-hand side of the above equation is a polynomial in the variable $\lambda$ , of degree $N$ . This is called the characteristic polynomial of the matrix $\mathbf{A}$ . Its roots are eigenvalues of $\mathbf{A}$ , and vice versa.

For $2\times 2$ matrices, the standard way of calculating the eigenvalues is to find the roots of the characteristic polynomial. However, this is not a reliable method for finding the eigenvalues of larger matrices. There is a well-known and important result in mathematics, known as Abel's impossibility theorem, which states that polynomials of degree $5$ and higher have no general algebraic solution. (By comparison, degree-2 polynomials have a general algebraic solution, which is the familiar quadratic formula, and similar formulas exist for degree-3 and degree-4 polynomials.) A matrix of size $N \ge 5$ has a characteristic polynomial of degree $N \ge 5$ , and Abel's impossibility theorem tells us that we can't calculate the roots of that characteristic polynomial by ordinary arithmetic.

In fact, Abel's impossibility theorem leads to an even stronger conclusion: there is no general algebraic method for finding the eigenvalues of a matrix of size $N \ge 5$ , whether using the characteristic polynomial or any other method. For suppose we had such a method for finding the eigenvalues of a matrix. Then, for any polynomial equation of degree $N \ge 5$ , of the form

$a_0 + a_1 \lambda + \cdots + a_{N-1} \lambda^{N-1} + \lambda^N = 0,$

we can construct an $N\times N$ "companion matrix" of the form

$\mathbf{A} = \begin{bmatrix}0 & 1 & 0 & \cdots & 0 \\ 0 & 0 & 1& \cdots & 0 \\ \vdots&\vdots&\ddots&\ddots& \vdots \\ 0&0&0&\ddots&1\\-a_0& -a_1& -a_2 & \cdots & -a_{N-1} \end{bmatrix}.$

As you can check for yourself, each root $\lambda$ of the polynomial is also an eigenvalue of the companion matrix, with corresponding eigenvector

$\vec{x} = \begin{bmatrix}1\\\lambda\\ \vdots \\ \lambda^{N-1}\end{bmatrix}.$

Hence, if there exists a general algebraic method for finding the eigenvalues of a large matrix, that would allow us to find solve polynomial equations of high degree. Abel's impossibility theorem tells us that no such solution method can exist.

This might seem like a terrible problem, but in fact there's a way around it, as we'll shortly see.

6.1.3 Hermitian Matrices

A Hermitian matrix $\mathbf{H}$ is a matrix which has the property

$\mathbf{H}^\dagger = \mathbf{H},$

where $\mathbf{H}^\dagger$ denotes the "Hermitian conjugate", which is matrix transposition accompanied by complex conjugation:

$\mathbf{H}^\dagger \equiv \left(\mathbf{H}^T\right)^*, \quad \mathrm{i.e.}\;\;\left(H^\dagger\right)_{ij} = H_{ji}^*.$

Hermitian matrices have the nice property that all their eigenvalues are real. This can be easily proven using index notation:

$\begin{align}\sum_j H_{ij} x_j = \lambda x_i \;\;&\Rightarrow\;\; \sum_j x_j^* H_{ji} = \lambda^* x_i^*\\ &\Rightarrow \sum_{ij} x^*_i H_{ij} x_j = \lambda \sum_i |x_i|^2 = \lambda^* \sum_j |x_j|^2 \\ &\Rightarrow \lambda = \lambda^*.\end{align}$

In quantum mechanics, Hermitian matrices play a special role: they represent measurement operators, and their eigenvalues (which are restricted to the real numbers) are the set of possible measurement outcomes.

Search

Text Color

Text Size

Margin Size

Font Type

6.1.1 Matrix Diagonalization

6.1.2 The Characteristic Polynomial

6.1.3 Hermitian Matrices

Support Center

How can we help?