In the case where we have a population where some of the elements in that population have a specific characteristic, we talk about the

± (margin of error)

Instead, if we change our special conditions, we can use instead of

- The sample size
**n**is no more than 5% of the population size. (Another, popular, way to say this is that the population size is more than 20 times the sample size,**n**.) - Population items either have the characteristic or the do not. That is another way of saying that the population items fall into one of two categories, those with the characteristic and those without the characteristic.
- The sample must contain at least 10 items in each of the two categories.

An example is in order here. We start with a population of enormous size, something having over 20,000 items. We take a sample of size 73 (this is less than 5% of the population so we are OK on that point). Of the 73 items, 17 have a certain characteristic. That means that 73-17=56 items do not have the characteristic. (There are more than 10 items that do and 10 items that do not have the characteristic in the sample. , so we are OK on that point). Then we can compute an approximation to the sample proportion as

To find the

x <- 17 n <- 73 p_hat <- x/n p_hat p_hat_sd <- sqrt(p_hat*(1-p_hat)/n) p_hat_sd alpha_div_2 <- (1-0.95)/2 alpha_div_2 z <- abs( qnorm( alpha_div_2)) z lower <- p_hat - z*p_hat_sd upper <- p_hat + z*p_hat_sd lower upperFigure 1 shows the console view of performing those statements.

The difference between our earlier hand computed

We could do many more examples, but they all follow the same computations shown in Figure 1. As we have done before, we can codify those computations in a function. All we need to do is to feed the function the values of

ci_prop <- function( n, x, cl=0.95) { # compute a confidence interval for the # proportion given the sample size, the # number of items with the characteristic, # and the confidence level # do a few checks on the information given if( cl <=0.0 | cl >= 1 ) {return("Confidence level needs to be between 0 and 1")} if( x < 10) {return("Need at least 10 items with the characteristic")} if( n-x < 10) {return("Need at lease 10 items without the characteristic")} # we have no way to check if we are sampling < 5% # of the population p_hat <- x/n p_hat_sd <- sqrt(p_hat*(1-p_hat)/n) alpha_div_2 <- (1-cl)/2 z <- abs( qnorm( alpha_div_2)) lower <- p_hat - z*p_hat_sd upper <- p_hat + z*p_hat_sd if(lower < 0) { lower<-0} if(upper > 1 ) { upper <- 1} result<-c(lower, upper, p_hat, z, p_hat_sd) names(result) <- c("lower", "upper", "p hat", "z-score", "p hat sd") return( result ) }Once defined we can use the statement

Now that we have the

ci_prop(73,10,.95) ci_prop(73,7,0.95) ci_prop(73,64,0.95) ci_prop(73,17,0.90) ci_prop(73,17,0.80) ci_prop(73,17,0.99)The console report on these is given in Figure 3.

The results shown in Figure 3 illustrate the effect of having a smaller , of having too small a value for

Let us look at making the

We can make the

The problem with this is that we cannot be sure that we will get the same proportion of items with the specified characteristic in a new sample. If we use the situations illustrated in Figure 4 we see that the

One final note here is that although we stated above that the population was enormous, we also qualified that by saying that it had over 20,000 items. If we wanted to take a sample of size

Return to Topics page

©Roger M. Palay Saline, MI 48176 January, 2016