## Find Sample Size for specified Margin of Error

We start with a quick review. The process of computing a confidence interval in the case where we know the population standard deviation and where we have a sample of size n that yields a sample mean is as follows:
1. From the confidence level compute the value of using
2. Use qnorm() to find the associated z-score,
3. Find the margin of error as
4. Find the two parts to the confidence interval by evaluating
We notice that the margin of error is calculated from three things: the confidence level we choose, the standard deviation of the population, and the sample size. If we know the first two, we can choose the last one, the sample size, to produce a margin of error that is as small as we want.

Consider the case where we know that the population standard deviation is 18.23 and where we know that we want a 95% confidence interval. Our margin of error will be the value of the expression . However, in that expression we know that σ=18.23 and we determine that =0.025, which means that we can use qnorm(0.025) to find  = -1.96. The absolute value in the expression for the margin of error means that our margin of error now simplifies to 1.96*18.23/sqrt(n) = 35.7308/sqrt(n). If we choose n to be 16 then sqrt(16)=4 and our margin of error becomes 35.7308/4, a number just smaller than 9. What if we want the margin of error to be less than 7? Then we need to have the denominator be a little bit larger than 5. To do that we need the sample size to be a little bigger than 25. How much bigger? We could try 26. Then sqrt(26)≈5.099and 35.7308/5.099≈7.007. Close, but not less than 7. Try sample size=27. Then sqrt(27)≈5.196and 35.7308/5.196≈6.876, well below our goal of 7. Therefore, before we even start to take a random sample of the population, we know that if we take a sample of size 27 then our margin of error will be 6.876.

What if we want the margin of error to be even smaller, say less than or equal to 2? We need 35.7308/sqrt(n) to be less than 2. Clearly, a denominator of 18 will work, and that would mean that we would want sample size of 18*18=324. But could we use 323? The sqrt(323) is about 17.9722 and 35.7308/17.9722≈1.988, so 323 would work, but what about 322?

This "guessing and checking" is not a good approach. We should be able to do much better! The original formula for the margin of error, m was
This is equivalent to
But then we could square both sides to get
For the problem above, using a 95% confidence level and a population that has a standard deviation of 18.23, find the smallest sample size that will produce a margin of error that is 2 or less, we could evaluate the right-hand side of that expression in R as shown in Figure 1.

Figure 1

We cannot have a sample size of 319.1608, and we know that increasing the sample size decreases the margin of error. Therefore, we always round the answer up to the next highest whole number, in this case 320.

Were we looking for the smallest sample size that is required to have the margin of error be 3 or less then we use the command in Figure 2.

Figure 2

Our answer would be to use 142 as the sample size.

Or, if we want the smallest sample size that generates a confidence interval with the margin of error being 4.25 then the computation would be as in Figure 3.

Figure 3

Our answer would be to use 71 as the sample size.

The formula is not difficult to evaluate, but it is a pain to remember. We could just create a function to do this for us. A "quick and dirty" function, one without any of the checks on the appropriateness of the values sent to it, would be
```find_samp_size <- function( sigma, cl, moe)
{
z <- qnorm( (1-cl)/2)
quotient <- ( z * sigma / moe)^2
if (as.integer( quotient) != quotient )
{ quotient <- as.integer(quotient) + 1}
return( quotient )
}
```
Figure 4 shows that function and three applications of it to do the three problems that we did in Figures 1 through 3.

Figure 4

#### Worksheet for Finding Sample Size

Here is a link to a worksheet with randomly generated problems related to finding the required sample size for a given confidence level and a known population standard deviation.