8.2: Stirling's Approximation. Lagrangian Multipliers.

Last updated
Save as PDF

Page ID: 6693

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

In the derivation of Boltzmann's equation, we shall have occasion to make use of a result in mathematics known as Stirling's approximation for the factorial of a very large number, and we shall also need to make use of a mathematical device known as Lagrangian multipliers. These two mathematical topics are described in this section.

8.2i Stirling's Approximation

Stirling's approximation is

\[\ln{N}! \cong N \ln{N} - N . \tag{8.2.1} \label{8.2.1}\]

Its derivation is not always given in discussions of Boltzmann's equation, and I therefore offer one here.

The gamma function is defined as

\[\Gamma (x+1) = \int_0^\infty t^x e^{-t} dt \tag{8.2.2} \label{8.2.2}\]

or, what amounts to the same thing,

\[\Gamma (x) = \int_0^\infty t^{x-1} e^{-t} dt. \tag{8.2.3} \label{8.2.3}\]

In either case it is easy to derive, by integration by parts, the recursion formula

\[\Gamma (x+1) = x \Gamma (x). \tag{8.2.4} \label{8.2.4}\]

If \(x\) is a positive integer, \(N\), this amounts to

\[\Gamma (N+1) = N!. \tag{8.2.5} \label{8.2.5}\]

I shall start from equation \(\ref{8.2.2}\). It is easy to show, by differentiation with respect to \(t\), that the integrand \(t^x e^{-t}\) has a maximum value of \((x/e)^x\) where \(t = x\). I am therefore going to divide both sides of the equation by this maximum value, so that the new integrand is a function that has a maximum value of \(1\) where \(t = x\):

\[\left( \frac{e}{x} \right)^x \Gamma (x+1) = \int_0^\infty \left( \frac{t}{x} \right)^x e^{-(t-x)} dt. \tag{8.2.6} \label{8.2.6}\]

Now make a small change of variable. Let \(s = t − x\), so that

\[\left( \frac{e}{x} \right)^x \Gamma (x+1) = \int_{-x}^\infty \left(1+ \frac{s}{x} \right)^x e^{-s} ds = \int_{-x}^\infty f(s) ds. \tag{8.2.7} \label{8.2.7}\]

Bearing in mind that we aim to obtain an approximation for large values of \(x\), let us try to obtain an expansion of \(f (s)\) as a series in \(s/x\). A convenient way of obtaining this is to take the logarithm of the integrand:

\[\ln f(s) = x \ln \left( 1+\frac{s}{x}\right) - s, \tag{8.2.8} \label{8.2.8}\]

and. provided that \(|s| < x\), the Maclaurin expansion is

\[\ln f(s) = x \left( \frac{s}{x} - \frac{1}{2} \cdot \left( \frac{s}{x} \right)^2 + ... \right) - s. \tag{8.2.9} \label{8.2.9}\]

If \(x\) is sufficiently large, this becomes

\[\ln f(s) = - \frac{s^2}{2x}, \tag{8.2.10} \label{8.2.10}\]

so that \[\left( \frac{e}{x} \right)^x \Gamma (x+1) = \int_{-\infty}^{\infty} \exp \left( - \frac{s^2}{2x} \right) ds. \tag{8.2.11} \label{8.2.11}\]

While this integral is not particularly easy, it is at least well known (it occurs in the theory of the gaussian distribution, for example), and its value is \(\sqrt{2\pi x}\). Thus we have, for large \(x\),

\[\Gamma (x+1) = \left( \frac{x}{e} \right)^x \sqrt{2 \pi x} \tag{8.2.12} \label{8.2.12}\]

or, if \(x\) is an integer,

\[N! = \left( \frac{N}{e} \right)^N \sqrt{2 \pi N} \tag{8.2.13} \label{8.2.13}\]

On taking logarithms of both sides, we obtain

\[\ln{N!} = \left( N + \frac{1}{2} \right) \ln{N} - N + \ln{\sqrt{2 \pi}} , \tag{8.2.14} \label{8.2.14}\]

or, since \(N\) is large:

\[\ln{N!} \cong N \ln{N} - N . \tag{8.2.15} \label{8.2.15}\]

For very large \(N\) (i.e. if \(\ln{N} >> 1\)), we can make the further approximation

\[\ln{N!} = N \ln N \tag{8.2.16} \label{8.2.16}\]

\[\log{N!} = N \log{N}. \tag{8.2.17} \label{8.2.17}\]

Any of equations 8.2.12 - 17 may be referred to as Stirling's approximation.

The largest value of \(N\) for which my hand calculator will return \(N!\) is \(69\). For this, it gives

\[\ln{N!} = 226.2, \quad N\ln{N} - N = 223.2 .\]

For very large numbers, the approximation will be much better. The extreme approximation represented by equations \(\ref{8.2.16}\) or \(\ref{8.2.17}\), however, becomes reasonable only for unreasonably large numbers, such as the number of protons in the Universe. We shall be making use of the much better approximation equation \(\ref{8.2.15}\), which does not require such unimaginably huge numbers.

For smaller numbers that we commonly deal with in spectroscopy (where we are typically dealing with the number of atoms in a sample of gas) the following approximation is remarkably good:

\[\ln{N!} = \left( N + \frac{1}{2} \right) \ln N - N + \frac{1}{12N} + \ln{\sqrt{2 \pi}} \tag{8.2.18} \label{8.2.18}\]

This almost the same as equation \(\ref{8.2.14}\), except that, in deriving it, I have taken to expansion of equation \(\ref{8.2.9}\) to one more term. Thus, to eight significant figures, \(20! = 2.432 902 0 \times 10^{18}\), while equation \(\ref{8.2.18}\) results in \(2.432 \ 902 \ 9 \times 10^{18}\).

8.2ii Lagrangian Multipliers

This topic concerns the problem of determining where a function of several variables is a maximum (or a minimum) where the variables are not independent but are connected by one or more functional relations.

Let \(\psi = \psi(x, y, z)\) be some function of \(x\), \(y\) and \(z\). Then, if \(x\), \(y\) and \(z\) are independent variables, one would ordinarily understand that, where \(\psi\) is a maximum, the derivatives are zero:

\[\frac{\partial \psi}{\partial x} = \frac{\partial \psi}{\partial y} = \frac{\partial \psi}{\partial z} = 0. \tag{8.2.19} \label{8.2.19}\]

However, if \(x\), \(y\) and \(z\) are not completely independent, but are related by some constraining equation such as \(f (x, y,z) = 0\), the situation is slightly less simple. An example from thermodynamics comes to mind. Entropy, \(S\), is a function of state: \(S = S(P,V ,T)\). However, for a particular substance, \(P\), \(V\) and \(T\) are related by an equation of state. In effect, we cannot determine \(S\) for the system at any point in \(P\), \(V\), \(T\) space, but we are restricted to explore only on the two-dimensional surface represented by the equation of state.

We return now to our function \(\psi\). If we move by infinitesimal displacements \(dx\), \(dy\), \(dz\) from a point where \(\psi\) is a maximum, the corresponding changes in both \(\psi\) and \(f\) will be zero, and therefore both of the following equations must be satisfied:

\[d \psi = \frac{\partial \psi}{\partial x} dx + \frac{\partial \psi}{\partial y} dy + \frac{\partial \psi}{\partial z} dz = 0, \tag{8.2.20} \label{8.2.20}\]

\[df = \frac{\partial f}{\partial x} dx + \frac{\partial f}{\partial y} dy + \frac{\partial f}{\partial z} dz = 0 . \tag{8.2.21} \label{8.2.21}\]

Consequently any linear combination of \(\psi\) and \(f\), such as \(\phi = \psi + \lambda f\), where \(\lambda\) is an arbitrary constant, also satisfies a similar equation. The constant \(\lambda\) is sometimes called an "undetermined multiplier" or a "Lagrangian multiplier", although often some additional information in an actual problem enables the constant to be identified - and we shall see an example of this in the derivation of Boltzmann's equation.

In summary, the conditions that \(\psi\) is a maximum if \(x\), \(y\) and \(z\) are related by a functional constraint \(f (x, y,z) = 0\) are

\[\frac{\partial \phi}{\partial x} = 0 , \ \frac{\partial \phi}{\partial y} = 0, \ \frac{\partial \phi}{\partial z} = 0, \tag{8.2.22} \label{8.2.22}\]

where \(\phi = \psi + \lambda f\).

Of course, if \(\psi\) is a function of many variables \(x_1\), \(x_2…\), not just three, and the variables are subject to several constraints, such as \(f = 0, \ g = 0, \ h = 0…\), etc., where \(f\), \(g\), \(h\), etc., are functions connecting all or some of the variables, the conditions for \(\psi\) to be a maximum (or minimum) are

\[\frac{\partial \psi}{\partial x_i} + \lambda \frac{\partial f}{\partial x_i} + \mu \frac{\partial g}{\partial x_i} + \nu \frac{\partial h}{\partial x_i} + ... = 0, \quad i=1,2,3,... \tag{8.2.23} \label{8.2.23}\]