Probability: Normal approximation of the Binomial
Return to Topics page
Moving on from looking at probabilities related to
proportions, we recognize that the binomial distribution
must be related to that proportion distribution. After all, in a binomial
distribution we report the number of successes out of some
number of trials.
In a proportion we report the quotient of the successes over the trials.
As a result, we find that if we do repeated random samples of n trials
then the mean of the count of success, denoted as
μX, will approach n*p where p
is the binomial probability of success on any one trial.
Furthermore, the standard deviation of the count of successes,
denoted as σX, will approach
√n*p*(1-p).
We add to those facts the idea that under certain conditions,
namely, when we have both
n*p ≥ 10 and n*(1-p) ≥ 10,
the binomial distribution can be approximated
by a normal distribution with
mean=μX = n*p and
standard deviation=σX = √n*p*(1-p).
A small note is in order here.
This "approximation" was almost essential
before we had calculators and computers.
Working out binomial probabilities
using the binomial formula involved
an enormous amount of multiplication.
If you did not plan the evaluation of values correctly,
those products could easily exceed 15 digits.
You also had to raise the values p and (1-p) to high powers.
Having a normal approximation to replace the evaluating those
expressions was a real gift. However, we do have calculators and we do have computers,
and the software has been tuned to do these problems directly and efficiently. This topic is
presented here because we have been teaching it so long that we have forgotten why we do it.
Sorry.
|
Let us walk through an example.
We start with a coin that has a probability of 0.68
that it will come up "tails" on a fair spin.
If we spin the coin 150 times, what is the
probability that we will have less than 93 tails?
With such a high number of spins we will use the
normal approximation to the binomial distribution.
First, we need to check that n*p ≥ 10
and n*(1-p) ≥ 10.
We have n = 150 and p = 0.68
giving n*p = 150*0.68 = 102 ≥10
and
n*(1-p) = 150*0.32 = 48 ≥ 10.
We can move forward.
The mean = n*p = 102 and
the standard deviation = √n*p*(1-p) = √150*0.68*0.32 ≈ 5.713143.
The R statement to find the desired value
is pnorm(92.5,mean=102,sd=5.713143).
A similar statement, along with supporting computations,
is found in Figure 1.
Figure 1
Note that we used 92.5 because the normal
distribution is continuous while the
binomial distribution is discrete.
By the way, since we do have R and we know that
R has the pbinom function, we could have
computed the straight binomial probability by using the
command pbinom(92,150,0.68) as is shown
in Figure 1a.
Figure 1a
Note how close our normal approximation to the binomial
in Figure 1 is to the computed value shown in
Figure 1a.
Using the same coin, we could move to answering
such questions as
- For the 150 spins, what is the probability that
we will get more than 108 tails?
- For the 150 spins, what is the probability that
we will get between 105 and 109 tails, inclusive?
- For the 150 spins, what is the probability that
we will get fewer than 98 tails or more than 107 tails?
The R commands to get those values would be
-
pnorm(108.5, mean=102, sd=5.713143, lower.tail=FALSE)
-
pnorm(109.5, mean=102, sd=5.713143) - pnorm(104.5, mean=102, sd=5.713143)
-
pnorm(97.5, mean=102, sd=5.713143) + pnorm(107.5, mean=102, sd=5.713143, lower.tail=FALSE)
Those commands are shown in Figure 2.
Figure 2
As we have done in the past, it would be easier to create
a function that does all of the preliminary work for us,
saving us from having to remember to do those checks as well
as computing the mean and standard deviation
that we need for the pnorm() function.
Such a function is:
pnormbino <- function( num_success, p, n, lower.tail=TRUE)
{
if(p*n < 10) {return("n*p < 10, will not compute this")}
if(n*(1-p) < 10)
{return("n*(1-p)<10, will not compute this")}
nbsd <- sqrt( n*p*(1-p))
prob <- pnorm(num_success, n*p, nbsd, lower.tail)
return( prob )
}
Figure 3 shows the same problems as before, but
solved with calls to the new pnormbino() function.
Figure 3
Return to Topics page
©Roger M. Palay
Saline, MI 48176 December, 2015
The modification date for this document is 09/30/2021 20:34:15