Skip to main content
Physics LibreTexts

7.1: The Binomial Distribution

  • Page ID
    32034
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    This is exemplified by the tossing of a coin. For a fair coin, we expect that if we toss it a very large number of times, then roughly half the time we will get heads and half the time we will get tails. We can say that the probability of getting heads is \(\frac{1}{2}\) and the probability of getting tails is \(1 - \frac{1}{2} = \frac{1}{2}\). Thus the two possibilities have equal a priori probabilities.

    Now consider the simultaneous tossing of N coins. What are the probabilities? For example, if \(N = 2\), the possibilities are \(HH\), \(HT\), \(TH\) and \(TT\). There are two ways we can get one head and one tail, so the probabilities are \(\frac{1}{4}\), \(\frac{1}{2}\), and \(\frac{1}{4}\) for two heads, one head and no heads respectively. The probability for one head (and one tail) is higher because there are many (two in this case) ways to get that result. So we can ask: How many ways can we get \(n_1\) heads (and \((N − n_1)\) tails)? This is given by the number of ways we can choose \(n_1\) out of \(N\), to which we can assign the heads. In other words, it is given by

    \[W(n_1,n_2) = \frac{N!}{n_1!(N-n_1)!} = \frac{N!}{n_1!n_2!},\;\;\;\;n_1\;+n_2=N \]

    The probability for any arrangement \(n_1\), \(n_2\) will be given by

    \[p(n_1,n_2) = \frac{W(n_1,n_2)}{\sum_{n ^\prime_1,n ^\prime_2} W(n ^\prime _1,n ^\prime_2)} = \frac{W(n_1,n_2)}{2^N} \label{7.1.2}\]

    where we have used the binomial theorem to write the denominator as \(2^N\). This probability as a function of \(x = \frac{n_1}{N}\) for large values of \(N\), \(n_1\) is shown in Figure \(\PageIndex{1}\). Notice that already for \(N= 8\), the distribution is sharply peaked around the middle value of \(n_1 \;= \;4\). This becomes more and more pronounced as \(N\) becomes large. We can check the place where the maximum occurs by noting that the values \(\frac{n_1}{N}\) and \(\frac{(n_1 + 1)}{N}\) are very close to each other, infinitesimally different for \(N → ∞\), so that \(x = \frac{n_1}{N}\) may be taken to be continuous as \(N → ∞\). Further, for large numbers, we can use the Stirling formula

    \[\log N! \;≈\; N \log N \;−\; N\]

    clipboard_e67c479bca7fe73e361e4392c9bc9e3fd.png
    Figure \(\PageIndex{1}\): The binomial distribution showing \(W(n_1, n_2)\) as a function of \(n_1\) = the number of heads for \(N = 8\)

    Then we get

    \[\begin{align*} \log p &= \log W(n_1, n_2) - N \log 2 \\[4pt] &≈ N \log N-N- (n_1 \log n_1 \;−\; n_1) − (N − n_1) \log(N− n_1) + (N −n_1) − N \log 2 \\[4pt] &≈ −N [x \log x + (1 − x) \log(1 − x)] − N \log 2 \end{align*}\]

    This has a maximum at \(x = x_∗ = \frac{1}{2}\). Expanding \(\log p\) around this value, we get

    \[\log p \;≈\; −2 N (x − x_∗)^2 \;+\; O((x − x_∗)^3 ),\;\;\;\; or\; p \;≈\; exp( −2 N (x − x∗) ^2 ) \]

    We see that the distribution is peaked around \(x_∗\) with a width given by \(∆x^2 \;∼\; (\frac{1}{4}N)\). The probability of deviation from the mean value is very very small as \(N → ∞\). This means that many quantities can be approximated by their mean values or values at the maximum of the distribution.

    We have considered equal a priori probabilities. If we did not have equal probabilities then the result will be different. For example, suppose we had a coin with a probability of \(q\), \(0 < q < 1\) for heads and probability \((1 − q)\) for tails. Then the probability for \(N\) coins would go like

    \[p(n_1, n_2) = q^{n_1} (1 − q)^{n_2} W(n_1, n_2)\]

    (Note that \(q = \frac{1}{2}\) reproduces Equation \ref{7.1.2}.) The maximum is now at \(x \;= \; x_∗ = q\). The standard deviation from the maximum value is unchanged.

    Here we considered coins for each of which there are only two outcomes, head or tail. If we have a die with 6 outcomes possible, we must consider splitting \(N\) into \(n_1, n_2, · · · , n_6\). Thus we can first choose \(n_1\) out of \(N\) in \(\frac{N!}{(n_1!(N − n_1)!)}\) ways, then choose \(n_2\) out of the remaining \(N − n_1\) in \(\frac{(N − n_1)!}{(n_2!(N − n_1 − n_2)!)}\) ways and so on, so that the number of ways we can get a particular assignment \( {n_1, n_2, · · · , n_6}\) is

    \[W(\{n_i\}) = \frac{N!}{(n_1!(N − n_1)!)} \frac{(N − n_1)!}{(n_2!(N − n_1 − n_2)!)}· · ·= \frac{N!}{n_1, n_2, · · · , n_6},\;\;\;\; \sum_i n_i = N \]

    More generally, the number of ways we can distribute \(N\) particles into \(K\) boxes is

    \[W(\{n_i\}) = N_i\prod_i^K \frac{1}{n_i!},\;\;\;\;\; \sum_{i=1}^K n_1\;=N \]

    Basically, this gives the multinomial distribution.


    This page titled 7.1: The Binomial Distribution is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by V. Parameswaran Nair.