This is a situation where we have two populations. Furthermore, within those populations we can distinguish characteristics such that we can say that the proportion of population

We will start, as we have for other situations, by looking at the process that we need to use to create a confidence interval, at some level of confidence, for the

Using those values we see that we have and . Then is an estimate of

For our confidence interval we need a point estimate of

- These are random samples
- The samples are independent
- The samples are big enough to have at least 10 successes and 10 failures in each sample
- The samples are small enough so that they represent less than 5% of their respective populations.

With all of that we have the confidence interval specified as

or

Our computations become

Of course we could do these calculations in R via the commands

n_one <- 66 x_one <- 43 phat_one <- x_one / n_one phat_one n_two <- 87 x_two <- 68 phat_two <- x_two / n_two phat_two pe <- phat_one - phat_two pe alpha=1-0.95 alphadiv2<-alpha/2 alphadiv2 z <- qnorm(alphadiv2, lower.tail=FALSE) z std_err <- sqrt( phat_one*(1-phat_one)/n_one + phat_two*(1-phat_two)/n_two) std_err moe <- z*std_err moe pe - moe pe + moeFigure 1 gives the console view of these commands.

The pattern for finding such a confidence interval does not change. About the only wrinkle that might be added is to start with the raw data.

Assuming that

# second example n_one <- 74 x_one <- 24 phat_one <- x_one / n_one phat_one n_two <- 89 x_two <- 32 phat_two <- x_two / n_two phat_two pe <- phat_one - phat_two pe alpha=1-0.90 alphadiv2<-alpha/2 alphadiv2 z <- qnorm(alphadiv2, lower.tail=FALSE) z std_err <- sqrt( phat_one*(1-phat_one)/n_one + phat_two*(1-phat_two)/n_two) std_err moe <- z*std_err moe pe - moe pe + moeThe console view of those commands is shown in Figure 2.

Figure 2 shows all of the computations resulting in a 90% confidence interval of

At this point it is pretty clear that the computations should be captured in a function. One such function is

ci_2popproportion <- function( n_one, x_one, n_two, x_two, cl=0.95) { phat_one <- x_one / n_one phat_two <- x_two / n_two pe <- phat_one - phat_two alpha=1-cl alphadiv2<-alpha/2 z <- qnorm(alphadiv2, lower.tail=FALSE) std_err <- sqrt( phat_one*(1-phat_one)/n_one + phat_two*(1-phat_two)/n_two) moe <- z*std_err ci_low <- pe - moe ci_high <- pe + moe result <- c( ci_low, ci_high, moe, std_err, z, alphadiv2, phat_one, phat_two) names( result ) <- c("ci low", "ci_high", "M of E", "Std. Err", "z-value", "alpha/2", "p hat 1", "p hat 2") return( result ) }Once this function is defined, we can load it and use it to solve the two problems presented above via commands such as

source("../ci_2popproport.R") # do the first problem ci_2popproportion(66,43,87,68,0.95) # do the second problem ci_2popproportion(74,24,89,32,0.90)which produce the results shown in Figure 3.

As expected, the result of the function call shown in Figure 3 gives us all the information that we so carefully constructed in the earlier figures.

This is a good place to look at the effect of having larger samples. To do this we will use the command

ci_2popproportion(74*12,24*12,89*12,32*12,0.90)to generate a 90% confidence interval from samples that are 12 times the size of the

Comparing the two results we can see that having that larger sample size reduces the

One small addition to this discussion is getting the count of successes in the two samples. We could do this in R via the commands

gnrnd4( key1=956347307, key2=7943 ) #get the count which all we need table(L1) gnrnd4( key1=753758807, key2=8853 ) # get the count which all we need table(L1)Which produce the console view shown in Figure 5.

Clearly, Figure 5 indicates that there were 24 instances of a

**H**_{1}: p_{1}- p_{2}≠ 0**H**_{1}: p_{1}- p_{2}< 0**H**_{1}: p_{1}- p_{2}> 0

To do this we want two independent random samples, one from each of the populations. In addition, in order to run this kind of test we need

- The samples are big enough to have at least 10 successes and 10 failures in each sample
- The samples are small enough so that they represent less than 5% of their respective populations.

Remember that in order to make this test we assume that the null hypothesis,

For a 5% significance level we need to find the

From the data it is interesting to note that

# case 3 == hypothesis test z_high <- qnorm(.05,lower.tail=FALSE) z_high n_one <- 92 x_one <- 37 phat_one <- x_one / n_one phat_one n_two <- 83 x_two <- 28 phat_two <- x_two / n_two phat_two phat <- (x_one+x_two) / (n_one+n_two) phatThe result of those commands is given in Figure 6.

Using those results we can find the

These last calculations can be done in R using

std_err <- sqrt( phat*(1-phat)*(1/n_one +1/n_two) ) std_err crit_high <- z_high * std_err crit_high diff <- phat_one - phat_two diffFigure 7 holds the console image of those commands.

We will do almost all of the above calculations for the attained significance approach. In particular, we compute the sample proportions, the

Repeating some of the earlier commands, the following R statements compute these values.

# redo those for the attained significance n_one <- 92 x_one <- 37 phat_one <- x_one / n_one phat_one n_two <- 83 x_two <- 28 phat_two <- x_two / n_two phat_two phat <- (x_one+x_two) / (n_one+n_two) phat std_err <- sqrt( phat*(1-phat)*(1/n_one +1/n_two) ) std_err diff <- phat_one - phat_two diff z <- diff/std_err z pnorm( z, lower.tail=FALSE)Figure 8 holds the console view of those statements.

Because this is a

From the data it is interesting to note that

# case 4 == hypothesis test 2-sided z_high <- qnorm(.025,lower.tail=FALSE) z_high n_one <- 306 x_one <- 54 phat_one <- x_one / n_one phat_one n_two <- 422 x_two <- 108 phat_two <- x_two / n_two phat_two phat <- (x_one+x_two) / (n_one+n_two) phatThe result of those commands is given in Figure 9.

Using those results we can find the

These last calculations can be done in R using

std_err <- sqrt( phat*(1-phat)*(1/n_one +1/n_two) ) std_err crit_high <- z_high * std_err crit_high crit_low <- -z_high * std_err crit_low diff <- phat_one - phat_two diffFigure 10 holds the console image of those commands.

We will do almost all of the above calculations for the attained significance approach. In particular, we compute the sample proportions, the

Repeating some of the earlier commands, the following R statements compute these values.

# redo those for the attained significance n_one <- 306 x_one <- 54 phat_one <- x_one / n_one phat_one n_two <- 422 x_two <- 108 phat_two <- x_two / n_two phat_two phat <- (x_one+x_two) / (n_one+n_two) phat std_err <- sqrt( phat*(1-phat)*(1/n_one +1/n_two) ) std_err diff <- phat_one - phat_two diff z <- diff/std_err z if ( z > 0 ) {half_area <- pnorm( z, lower.tail=FALSE)} else {half_area <- pnorm( z )} half_area half_area*2Figure 11 holds the console view of those statements.

As we have noted before, it makes little sense to keep reconstructing these commands. We are far better off if we capture the algorithm in a function. Consider the following possible function definition.

hypoth_2test_prop <- function( x_one, n_one, x_two, n_two, H1_type, sig_level=0.05) { # perform a hypothsis test for the difference of # two proportions based on two samples. # H0 is that the proportions are equal, i.e., # their difference is 0 # The alternative hypothesis is # != if H1_type =0 # < if H1_type < 0 # > if H1_type > 0 # Do the test at sig_level significance. phat_one <- x_one / n_one phat_two <- x_two / n_two phat <- (x_one+x_two) / (n_one+n_two) std_err <- sqrt( phat*(1-phat)*(1/n_one +1/n_two) ) diff <- phat_one - phat_two if( H1_type==0) { z <- abs( qnorm(sig_level/2))} else { z <- abs( qnorm(sig_level))} to_be_extreme <- z*std_err decision <- "Reject" if( H1_type < 0 ) { crit_low <- - to_be_extreme crit_high = "n.a." if( diff > crit_low) { decision <- "do not reject"} attained <- pnorm( diff, mean=0, sd=std_err) alt <- "p_1 < p_2" } else if ( H1_type == 0) { crit_low <- - to_be_extreme crit_high <- to_be_extreme if( (crit_low < diff) & (diff < crit_high) ) { decision <- "do not reject"} if( diff < 0 ) { attained <- 2*pnorm(diff, mean=0, sd=std_err)} else { attained <- 2*pnorm(diff, mean=0, sd=std_err, lower.tail=FALSE) } alt <- "p_1 != p_2" } else { crit_low <- "n.a." crit_high <- to_be_extreme if( diff < crit_high) { decision <- "do not reject"} attained <- pnorm(diff, mean=0, sd=std_err, lower.tail=FALSE) alt <- "p_1 > p_2" } result <- c( alt, n_one, x_one, phat_one, n_two, x_two, phat_two, phat, std_err, z, crit_low, crit_high, diff, attained, decision) names(result) <- c("H1:", "n_one","x_one", "phat_one", "n_two","x_two", "phat_two", "pooled", "Std Err", "z extreme", "critical low", "critical high", "difference", "attained", "decision") return( result ) }Once this has been placed in the parent directory under the name

source("../hypo_2popproport.R") hypoth_2test_prop(37,92,28,83,1,0.05)The console image of those two commands is shown in Figure 12.

The results shown in Figure 12 are the same as those found above in Figures 6, 7, and 8.

To do

hypoth_2test_prop(54,306,108,422,0,0.05)to produce the output shown n Figure 13.

The results shown in Figure 123 are the same as those found above in Figures 9, 10, and 11.

To do another example, this one another one-tail test, we use the command

hypoth_2test_prop(54,306,108,422,-1,0.005)the result of which appears in Figure 14.

Below is a listing of the R commands used in generating this page.

n_one <- 66 x_one <- 43 phat_one <- x_one / n_one phat_one n_two <- 87 x_two <- 68 phat_two <- x_two / n_two phat_two pe <- phat_one - phat_two pe alpha=1-0.95 alphadiv2<-alpha/2 alphadiv2 z <- qnorm(alphadiv2, lower.tail=FALSE) z std_err <- sqrt( phat_one*(1-phat_one)/n_one + phat_two*(1-phat_two)/n_two) std_err moe <- z*std_err moe pe - moe pe + moe # second example n_one <- 74 x_one <- 24 phat_one <- x_one / n_one phat_one n_two <- 89 x_two <- 32 phat_two <- x_two / n_two phat_two pe <- phat_one - phat_two pe alpha=1-0.90 alphadiv2<-alpha/2 alphadiv2 z <- qnorm(alphadiv2, lower.tail=FALSE) z std_err <- sqrt( phat_one*(1-phat_one)/n_one + phat_two*(1-phat_two)/n_two) std_err moe <- z*std_err moe pe - moe pe + moe ci_2popproportion <- function( n_one, x_one, n_two, x_two, cl=0.95) { phat_one <- x_one / n_one phat_two <- x_two / n_two pe <- phat_one - phat_two alpha=1-cl alphadiv2<-alpha/2 z <- qnorm(alphadiv2, lower.tail=FALSE) std_err <- sqrt( phat_one*(1-phat_one)/n_one + phat_two*(1-phat_two)/n_two) moe <- z*std_err ci_low <- pe - moe ci_high <- pe + moe result <- c( ci_low, ci_high, moe, std_err, z, alphadiv2, phat_one, phat_two) names( result ) <- c("ci low", "ci_high", "M of E", "Std. Err", "z-value", "alpha/2", "p hat 1", "p hat 2") return( result ) } source("../ci_2popproport.R") # do the first problem ci_2popproportion(66,43,87,68,0.95) # do the second problem ci_2popproportion(74,24,89,32,0.90) # look at the effect of having a larger sample ci_2popproportion(74*12,24*12,89*12,32*12,0.90) gnrnd4( key1=956347307, key2=7943 ) #get the count which all we need table(L1) gnrnd4( key1=753758807, key2=8853 ) # get the count which all we need table(L1) # case 3 == hypothesis test z_high <- qnorm(.05,lower.tail=FALSE) z_high n_one <- 92 x_one <- 37 phat_one <- x_one / n_one phat_one n_two <- 83 x_two <- 28 phat_two <- x_two / n_two phat_two phat <- (x_one+x_two) / (n_one+n_two) phat std_err <- sqrt( phat*(1-phat)*(1/n_one +1/n_two) ) std_err crit_high <- z_high * std_err crit_high diff <- phat_one - phat_two diff # redo those for the attained significance n_one <- 92 x_one <- 37 phat_one <- x_one / n_one phat_one n_two <- 83 x_two <- 28 phat_two <- x_two / n_two phat_two phat <- (x_one+x_two) / (n_one+n_two) phat std_err <- sqrt( phat*(1-phat)*(1/n_one +1/n_two) ) std_err diff <- phat_one - phat_two diff z <- diff/std_err z pnorm( z, lower.tail=FALSE) # case 4 == hypothesis test 2-sided z_high <- qnorm(.025,lower.tail=FALSE) z_high n_one <- 306 x_one <- 54 phat_one <- x_one / n_one phat_one n_two <- 422 x_two <- 108 phat_two <- x_two / n_two phat_two phat <- (x_one+x_two) / (n_one+n_two) phat std_err <- sqrt( phat*(1-phat)*(1/n_one +1/n_two) ) std_err crit_high <- z_high * std_err crit_high crit_low <- -z_high * std_err crit_low diff <- phat_one - phat_two diff # redo those for the attained significance n_one <- 306 x_one <- 54 phat_one <- x_one / n_one phat_one n_two <- 422 x_two <- 108 phat_two <- x_two / n_two phat_two phat <- (x_one+x_two) / (n_one+n_two) phat std_err <- sqrt( phat*(1-phat)*(1/n_one +1/n_two) ) std_err diff <- phat_one - phat_two diff z <- diff/std_err z if ( z > 0 ) {half_area <- pnorm( z, lower.tail=FALSE)} else {half_area <- pnorm( z )} half_area half_area*2 source("../hypo_2popproport.R") hypoth_2test_prop(37,92,28,83,1,0.05) hypoth_2test_prop(54,306,108,422,0,0.05) hypoth_2test_prop(54,306,108,422,-1,0.005)

©Roger M. Palay Saline, MI 48176 February, 2016