Explore Hypothesis tests, Difference of paired values


The script below provides a way to
  1. Create one population of values with specified means and standard deviations.
  2. Create a parallel population of values with a specified mean and standard deviation to hold the change from the first population to a new population of paired values.
  3. Create that second population of paired values.
  4. Confirm the mean and standard deviation of each population.
  5. Specify the size of a sample to be taken from each population of paired values.
  6. Specify the significance level to use to for our hypothesis tests.
  7. Specify the number of times to take such samples.
  8. Perform the sampling and, for each sample test H0 that the mean difference between the pairs is zero (0) against the alternative H1 that the difference is not zero (a two-tailed test).
  9. Keep track of the number of times that we reject and the number of times that wedo not reject the null hypothesis.
  10. Report a dataframe of the very last of the samples taken.
  11. Report the count of the number of rejections and not rejections of the H0.
If we create the differences with a zero mean then if ask for a significant number of samples, say 10,000, we can see that we reject the null hypothesis, which is true, about the expected number of times, i.e., that we make a Type I error as often as prediced by our significance level. If, on the other hand, we make the mean of the differences be not 0, then the null hypothesis is false and running the samples will show us the number of times that we make a Type II error. By adjusting the sample sizes and/or the magnitude of the mean difference, we can see what happens to the number of Type II errors.

In the folder containing the function scripts for this course create a new directory, copy the model.R file to that directory, rename the file in the new directory, double click on the file to open Rstudio. Then copy all of the text below the line and paste it into your Rstudio editor pane. Then, you can highlight the entire script and run it to use the default values. After that you can go back and change parameters and run the script again to explore the consequences of those changes.
  
# We look at doing a hypothesis test for the
# difference of two population means when we 
# are looking at paired values

#  first set up some goal populations
pop_one_mean <- 23.2
pop_one_sd <- 5.3
#  now we want to set up the differences
#  note that if we set the difference to be 0
#     then the null hypothesis is true

pop_diff_mean <- 0.0
pop_diff_sd <- 3.2

#  then create the first values as a  population,
#  with 1000 values

#  First generate an approximate standard normal
pop_one <- rnorm( 1000 )

#  Then get its mean and standard deviation
mu_1 <- mean( pop_one)
#  we want the standard deviation of the population
source("../pop_sd.R")
sd_1 <- pop_sd( pop_one )

#  Then create the distribution we want
pop_one <- ( (pop_one-mu_1)/sd_1 )* pop_one_sd+pop_one_mean

#   finally, verify that we have the right population
mean( pop_one )
pop_sd( pop_one )

#  Now do the same thing for pop_diff

#  First generate an approximate standard normal
pop_diff <- rnorm( 1000 )

#  Then get its mean and standard deviation
mu_2 <- mean( pop_diff )

sd_2 <- pop_sd( pop_diff )

#  Then create the distribution we want
pop_diff <- ( (pop_diff-mu_2)/sd_2 ) *
                pop_diff_sd+pop_diff_mean

#   finally, verify that we have the right population
mean( pop_diff )
pop_sd( pop_diff )

#   Now we can create the second values for each
#   each value in the original population

pop_two <- pop_one + pop_diff



#   Now we want to repeat the following process of
#   getting a sample of the pairs of values,  Then,
#   for each pair we can get the difference, in our
#   case we will look at the second value minus the
#   first value (to mimic looking at growth).
#   Then we can run a test to see if that difference
#   is zero.

#   While we are at this, and because we know what that
#   difference is, we can count the number of times 
#   that we reject the null hypothesis

sig_level <- 0.05
samp_size <- 23

num_reps <- 1000
num_accept <- 0
num_reject <- 0
true_diff <- pop_diff_mean


#  our tests will be two-tailed tests, that is the
#  alternative is that the mean difference is not zero.
alpha_div_2 <- sig_level/2

# We know that we will be using the Student's-t
# distribution.  Therefore, we need to find the degrees
# of freedom.  But this is just one less than the 
# sample size.  Therefore, our t-value will be the
# same for each sample.

t_val <- qt( alpha_div_2, samp_size-1, lower.tail=FALSE)



for( i in (1:num_reps) )
{  # choose samples from pop one get sample mean and sd
   index_1 <- as.integer( runif( samp_size, 1, 1001))
   samp_1 <- pop_one[ index_1 ]
   samp_2 <- pop_two[ index_1 ]
   #  
   diff_values <- samp_2 - samp_1

   xbar_diff <- mean( diff_values )
   sd_diff <- sd( diff_values )
   s_e <- sd_diff/sqrt( samp_size )
   
   
   # Now, run the test
   if ( xbar_diff > 0 )
   { attained <- pt( xbar_diff/s_e, samp_size - 1,
                        lower.tail=FALSE)} else
   { attained <- pt( xbar_diff/s_e, samp_size - 1)}                    
   
   if( attained >= alpha_div_2 )
   { num_accept <- num_accept +1 } else
     
   { num_reject <- num_reject + 1}
}   

#  look at the last sample as a data frame
data.frame( samp_1, samp_2, diff_values )
#  report the number of successes
num_accept
num_reject