12.2: General Description
- Page ID
- 34869
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)12.2.1 Markov Processes
More generally, suppose we have a system possessing a discrete set of states, which can be labeled by an integer \(0, 1, 2, \dots\) A Markov process is a set of probabilistic rules that tell us how to choose a new state of the system, based on the system's current state. If the system is currently in state \(n\), then the probability of choosing state \(m\) on the next step is denoted by \(P(m|n)\). We call this the "transition probability" from state \(n\) to state \(m\). By repeatedly applying the Markov process, we move the system through a random sequence of states, \(\{n^{(0)}, n^{(1)}, n^{(2)}, n^{(3)}, \dots\}\), where \(n^{(k)}\) denotes the state on step \(k\). This kind of random sequence is called a Markov chain.
There is an important constraint on the transition probabilities of the Markov process. Because the system must transition to some state on each step,
\[\sum_{m} P(m|n) = 1 \;\;\; \mathrm{for}\;\mathrm{all}\; n \in \{0, 1, \dots\}.\]
Next, we introduce the idea of state probabilities. Suppose we look at the ensemble of all possible Markov chains which can be generated by a given Markov process. Let \(\{p_0^{(k)}, p_1^{(k)}, p_2^{(k)}, \dots \}\) denote the probabilities for the various states, \(n = 0, 1, 2,\dots\), on step \(k\). Given these, what are the probabilities for the various states on step \(k+1\)? According to Bayes' theorem, we can write \(p_m^{(k+1)}\) as a sum over conditional probabilities:
\[p_m^{(k+1)} = \sum_{n} P(m|n) \, p_n^{(k)}.\]
This has the form of a matrix equation:
\[\begin{bmatrix}p_0^{(k+1)} \\ p_1^{(k+1)} \\ \vdots\end{bmatrix} = \begin{bmatrix} P(0|0) & P(0|1) & \cdots \\ P(1|0) & P(1|1) & \cdots \\ \vdots & \vdots\end{bmatrix} \, \begin{bmatrix}p_0^{(k)} \\ p_1^{(k)} \\ \vdots\end{bmatrix},\]
where the matrix on the right-hand side is called the transition matrix. Each element of this matrix is a real number between \(0\) and \(1\); furthermore, because of the aforementioned conservation of transition probabilities, each column of the matrix sums to \(1\). In mathematics, matrices of this type are called "left stochastic matrices".
12.2.2 Stationary Distribution
A stationary distribution is a set of state probabilities \(\{\pi_0, \pi_1, \pi_2, \dots \}\), such that passing through one step of the Markov process leaves the probabilities unchanged:
\[\pi_m = \sum_{n} P(m|n) \, \pi_n.\]
By looking at the equivalent matrix equation, we see the vector \([\pi_0; \pi_1; \pi_2; \dots]\) must be an eigenvector of the transition matrix, with eigenvalue 1. It turns out that there is a mathematical theorem (the Perron–Frobenius theorem) which states every left stochastic matrix has an eigenvector of this sort. Hence, every Markov process possesses a stationary distribution. Stationary distributions are the main reasons we are interested in Markov processes. In physics, we are often interested in using Markov processes to model thermodynamic systems, such that a stationary distribution represents the distribution of thermodynamic micro-states under thermal equilibrium. (We'll see an example in the next section.) Knowing the stationary distribution, we can figure out all the thermodynamic properties of the system, such as its average energy.
In principle, one way to figure out the stationary distribution is to construct the transition matrix, solve the eigenvalue problem, and pick out the eigenvector with eigenvalue 1. The trouble is that we are often interested in systems where the number of possible states is huge—in some cases, larger than the number of atoms in the universe! In such cases, it is not possible to explicitly generate the transition matrix, let alone solve the eigenvalue problem.
We now come upon a happy and important fact: for a huge class of Markov processes, the distribution of states within a sufficiently long Markov chain will converge to the stationary distribution. Hence, in order to find out about the stationary distribution, we simply need to generate a long Markov chain, and study its statistical properties.