- Create two populations of values with specified means and standard deviations.
- Find the mean and standard deviation of each population.
- Specify the size of a sample to be taken from each population.
- Specify the confidence level to use.
- Specify the number of times to take such samples.
- Perform the sampling and, for each sample, generate a confidence interval for the difference of the population means.
- Keep track of the number of times that the generated confidence interval actually contains the true mean difference.
- Report that count.
- Report the standard deviation of the collection of the differences of the two sample means.

In the folder containing the function scripts for this course create a new directory, copy the model.R file to that directory, rename the file in the new directory, double click on the file to open Rstudio. Then copy all of the text below the line and paste it into your Rstudio editor pane. Then, you can highlight the entire script and run it to use the default values. After that you can go back and change parameters and run the script again to explore the consequences of those changes.

# We look at the 93% confidence interval for the # difference of two population means when we know # the standard deviation of each population. # first set up some goal populations pop_one_mean <- 23.2 pop_one_sd <- 5.3 pop_two_mean <- 26.8 pop_two_sd <- 6.2 # then create the two populations, each with # 1000 values # First generate an approximate standard normal pop_one <- rnorm( 1000 ) # Then get its mean and standard deviation mu_1 <- mean( pop_one) # we want the standard deviation of the population source("../pop_sd.R") sd_1 <- pop_sd( pop_one ) # Then create the distribution we want pop_one <- ( (pop_one-mu_1)/sd_1 )* pop_one_sd+pop_one_mean # finally, verify that we have the right population mean( pop_one ) pop_sd( pop_one ) # Now do the same thing for pop_two # First generate an approximate standard normal pop_two <- rnorm( 1000 ) # Then get its mean and standard deviation mu_2 <- mean( pop_two) sd_2 <- pop_sd( pop_two ) # Then create the distribution we want pop_two <- ( (pop_two-mu_2)/sd_2 )* pop_two_sd+pop_two_mean # finally, verify that we have the right population mean( pop_two ) pop_sd( pop_two ) # Now we want to repeat the following process of # getting two samples, one from each population, # and then generating the 93% confidence interval # for the difference of the population means. # While we are at this, and because we know what that # difference is, we can count the number of times # that the true difference is inside our interval. # Furthermore, let us keep track of the observed # differences so that later we can compare the # standard deviation of those differences to the # predicted value samp_one_size <- 12 samp_two_size <- 18 num_reps <- 100 num_success <- 0 num_fail <- 0 true_diff <- pop_one_mean - pop_two_mean predicted_sd <- sqrt( pop_one_sd^2/samp_one_size + pop_two_sd^2/samp_two_size ) diff_one_two <- (1:num_reps) # to hold the differences # since the confidence level is set we can find # z value with half the area to its right z_val <- qnorm( 0.035, lower.tail=FALSE) for( i in (1:num_reps) ) { # choose samples from pop one get sample mean index_1 <- as.integer( runif( samp_one_size, 1, 1001)) samp_1 <- pop_one[ index_1 ] xbar_1 <- mean( samp_1 ) # choose samples from pop two get sample mean index_2 <- as.integer( runif( samp_two_size, 1, 1001)) samp_2 <- pop_two[ index_2 ] xbar_2 <- mean( samp_2 ) this_diff <- xbar_1 - xbar_2 diff_one_two[i] <- this_diff # get the confidence interval ci_low <- this_diff - z_val*predicted_sd ci_high <- this_diff + z_val*predicted_sd in_ci <- (ci_low <= true_diff ) && ( true_diff <= ci_high ) if( in_ci ) { num_success <- num_success+1} else { num_fail <- num_fail + 1} } # report the number of successes num_success # report the standard deviation of our sample of # differences sd( diff_one_two ) # and our predicted value predicted_sd