Explore Hypothesis Tests, Two Pop., Diff means, σ's unknown


The script below provides a way to
  1. Create two populations of values with specified means and standard deviations.
  2. Find the mean and standard deviation of each population.
  3. Specify the size of a sample to be taken from each population.
  4. Specify the significance level to use for a hypothesis test that the means are equal.
  5. Specify the number of times to take such samples.
  6. Perform the sampling and, for each sample, run the hypothesis test, based on the sample standard deviations, that the difference of the population means is 0 vs. the alternative that the difference is not zero.
  7. Keep track of the number of times that the null hypothesis is not rejected.
  8. Report that count.
  9. Report the standard deviation of the collection of the differences of the two sample means.
If we start with the two populations having the same mean then by asking for a significant number of samples, say 10,000, we can see that we really do make a Type I error, reject H0 when it is true, at about the level of significance. On the other hand, if we start with the two populations having different means then by asking for a significant number of samples, say 10,000, we can see how often we are making a Type II error, not rejecting H0 when it is false. This we can explore as we change just how different the means are and/or just how big we make our samples.

In the folder containing the function scripts for this course create a new directory, copy the model.R file to that directory, rename the file in the new directory, double click on the file to open Rstudio. Then copy all of the text below the line and paste it into your Rstudio editor pane. Then, you can highlight the entire script and run it to use the default values. After that you can go back and change parameters and run the script again to explore the consequences of those changes.
  
# We look at the hypothesis test for equal means
# of two population  when we do not know
# the standard deviation of each population.
# Our level of significance is set by
sig_level <- 0.05

# For this example we will have the alternative 
# hypothesis being that the difference is not zero, i.e.,
# that the means are not the same.

#  first set up some goal populations, so decide here
#  just how far apart we want the two means

pop_one_mean <- 23.2
pop_one_sd <- 5.3
pop_two_mean <- 26.2
pop_two_sd <- 6.2

#  then create the two populations, each with 
#  1000 values

#  First generate an approximate standard normal
pop_one <- rnorm( 1000 )

#  Then get its mean and standard deviation
mu_1 <- mean( pop_one)
#  we want the standard deviation of the population
source("../pop_sd.R")
sd_1 <- pop_sd( pop_one )

#  Then create the distribution we want
pop_one <- ( (pop_one-mu_1)/sd_1 )* pop_one_sd+pop_one_mean

#   finally, verify that we have the right population
mean( pop_one )
pop_sd( pop_one )

#  Now do the same thing for pop_two

#  First generate an approximate standard normal
pop_two <- rnorm( 1000 )

#  Then get its mean and standard deviation
mu_2 <- mean( pop_two)

sd_2 <- pop_sd( pop_two )

#  Then create the distribution we want
pop_two <- ( (pop_two-mu_2)/sd_2 )* pop_two_sd+pop_two_mean

#   finally, verify that we have the right population
mean( pop_two )
pop_sd( pop_two )


#   Now we want to repeat the following process of
#   getting two samples, one from each population,
#   and then using those samples, and their means,
#   to test for the equality of the population means.

#   While we are at this, and because we know what that
#   difference is, we can count the number of times 
#   that we accept or reject the null hypothesis
#   that the two means are equal.


samp_one_size <- 45
samp_two_size <- 39

num_reps <- 1000
num_accept <- 0
num_reject <- 0
true_diff <- pop_one_mean - pop_two_mean


for( i in (1:num_reps) )
{  # choose samples from pop one get sample mean
   index_1 <- as.integer( runif( samp_one_size, 1, 1001))
   samp_1 <- pop_one[ index_1 ]
   xbar_1 <- mean( samp_1 )
   sd_1 <- sd( samp_1 )
   # choose samples from pop two get sample mean
   index_2 <- as.integer( runif( samp_two_size, 1, 1001))
   samp_2 <- pop_two[ index_2 ]
   xbar_2 <- mean( samp_2 )
   sd_2 <- sd( samp_2 )
   this_diff <- xbar_1 - xbar_2

   
   # Now,  to run the test.  But to do this we need 
   #   to find the degrees of freedom
   d_1 <- sd_1^2/ samp_one_size
   d_2 <- sd_2^2/ samp_two_size
   df <- (d_1 + d_2)^2/ 
     (d_1^2/(samp_one_size-1)+d_2^2/(samp_two_size-1))

   #  and we get our standard error from the 
   #  sample standard deviations
   s_e <- sqrt( d_1+d_2 )
   
   if ( this_diff > 0 )
   { attained <- pt( this_diff/s_e, df,
                        lower.tail=FALSE)} else
   { attained <- pt( this_diff/s_e, df)
                       }                    
 
   if( attained >= sig_level/2 )
   { num_accept <- num_accept +1 } else
  
   { num_reject <- num_reject + 1}
}
#  report the number of times we accept H0
num_accept