.
Here we will use the value of
as the
standard deviation (the standard error) of the distribution
of the difference in the sample means. Furthermore,
that distribution will be a Student's t
distribution with a specific number of degrees of freedom.
We will see a "quick and dirty" way to get the degrees of
freedom and we will see a complex computation of the degrees of freedom.
The two methods give slightly different values. The former is much easier to compute
but it is not quite as "helpful" as is the latter. Once we have the computer to do the
complex computation there is little reason to use the "quick and dirty"
method. However, it is part of the course so we will see it here.
.


.
The value of that expression is about 1.5957.
All that remains is to find, for the standard Student's t distribution,
the value of t that has 5% of the area to the right of that value.
We could use the tables, a calculator, or the computer to find this.
However, no matter where we look for the value we also need a number
of degrees of freedom.
The "quick and dirty" method has us use the smaller of
38-1 and 31-1, or, rather, 30.
With 30 degrees of freedom we can determine that the t-value
that has 5% of the area to the right of that
t-value is approximately 1.697.
or
approximately 66.78369.
If we were using a table we would round this down to 66 degrees of
freedom. We find that the t-value with 66 degrees of freedom
that has 5% of the area to its right would be about 1.668.
We would find the same value if we used a calcualtor or computer to find
the t-value with 66 degrees of freedom although we would get more
digits displayed, as in 1.668271.
However, with a calulator or computer we can actually specify the
degrees of freedom as the value that we computed, the 66.78369.
Doing that gives the t-score of 1.667992
As you can see, this latter change is hardly noticable.
Therefore,depending upon how we got to the degrees of freedom
our confidence interval is:
| Table 1 | ||
| using the 30 degrees of freedom from the "quick and dirty" rule | using the 66 degrees of freedom from rounding the computed value down | using the 66.78369 degrees of freedom from the computation |
|
1.8 ± 1.697*1.5957 1.8 ± 2.708 (-0.908, 4.508) |
1.8 ± 1.668*1.5957 1.8 ± 2.662 (-0.862, 4.462) |
1.8 ± 1.667992*1.5957 1.8 ± 2.6616 (-0.8616, 4.4616) |
n_one <- 38
n_two <- 31
s_one <- 7.12
s_two <- 6.13
d_one <- s_one^2/n_one
d_two <- s_two^2/n_two
std_err <- sqrt( d_one+d_two)
std_err
if( n_one < n_two)
{ df_simple <- n_one - 1} else
{ df_simple <- n_two - 1}
t_simple <- qt(0.05,df_simple,lower.tail=FALSE)
#simple, quick and dirty, t-score
t_simple
#compute the degrees of freedom
d_one <- s_one^2/n_one
d_two <- s_two^2/n_two
d_one
d_two
df <-(d_one+d_two)^2/( d_one^2/(n_one-1)+d_two^2/(n_two-1))
# full computed degrees of freedom and t-score
df
dffloor <- floor( df )
t_full <- qt(0.05,df,lower.tail=FALSE)
t_full
# rounded down degrees of freedom and t-score
dffloor
t_rounded <- qt(0.05,dffloor,lower.tail=FALSE)
t_rounded
mean_one <- 17.4
mean_two <- 15.6
diff <- mean_one - mean_two
# our point estimate
diff
simple_moe <- t_simple*std_err
#simple margin of error
simple_moe
simple_low <- diff - simple_moe
simple_high<- diff + simple_moe
#simple confidence interval
simple_low
simple_high
rounded_moe <- t_rounded*std_err
# rounded margin of error
rounded_moe
# rounded confidence interval
rounded_low <- diff - rounded_moe
rounded_high <- diff + rounded_moe
rounded_low
rounded_high
full_moe <- t_full*std_err
# full margin of error
full_moe
# full confidence interval
full_low <- diff - full_moe
full_high <- diff + full_moe
full_low
full_high
Those lines break into 5 sections:
The console result of those commands is given in Figure 1a
get the standard error and simple degrees of freedom,
get the calculated number of degrees of freedom,
get confidence interval for the simple case, get the confidence intercal
for the rounded case, and get the confidence interval for the
full case.
gnrnd4( key1=1840954204, key2=0012801348 ) L_one <- L1 L_one gnrnd4( key1=1481765104, key2=0015301435 ) L_two <- L1 L_two n_one <- length(L_one) n_two <- length(L_two) s_one <- sd( L_one ) s_two <- sd( L_two ) mean_one <- mean( L_one ) mean_two <- mean( L_two ) n_one n_two mean_one mean_two s_one s_twothe console view of the last 6 lines of which appears in Figure 3.
ci_2unknown() shown below.
ci_2unknown <- function (
s_one, n_one, x_one,
s_two, n_two, x_two,
cl=0.95)
{
# try to avoid some common errors
if( (cl <=0) | (cl>=1) )
{return("Confidence interval must be strictly between 0.0 and 1")
}
if( (s_one <= 0) | (s_two <= 0) )
{return("Sample standard deviation must be positive")}
if( (as.integer(n_one) != n_one ) |
(as.integer(n_two) != n_two ) )
{return("Sample size must be a whole number")}
alpha <- 1 - cl
alpha_div_2 <- alpha / 2
d_one <- s_one^2/n_one
d_two <- s_two^2/n_two
std_err <- sqrt( d_one+d_two )
if( n_one < n_two)
{ df_simple <- n_one - 1} else
{ df_simple <- n_two - 1}
t_simple <- qt(alpha_div_2,
df_simple,lower.tail=FALSE)
#compute the degrees of freedom
d_one <- s_one^2/n_one
d_two <- s_two^2/n_two
df <-(d_one+d_two)^2/
( d_one^2/(n_one-1)+d_two^2/(n_two-1))
t_full <- qt(alpha_div_2,
df,lower.tail=FALSE)
diff <- x_one - x_two
# our point estimate
simple_moe <- t_simple*std_err
simple_low <- diff - simple_moe
simple_high<- diff + simple_moe
full_moe <- t_full*std_err
full_low <- diff - full_moe
full_high <- diff + full_moe
result <- c(full_low, full_high, full_moe,
simple_low, simple_high, simple_moe,
std_err, alpha_div_2, df, t_full,
df_simple, t_simple, diff )
names(result)<-c("Full Low", "Full High", "Full MOE",
"Simp Low", "Simp High", "Simp MOE",
"St. Error", "alpha/2","DF calc",
"t calc", "DF Simp", "t Simp", "Diff")
return( result )
}
This function takes care of all of our work.
Figure 5 demonstrates the use of the function,
using the values that we have for Case:1,
via the two statements
source("../ci_2unknown.R")
ci_2unknown(7.12, 38, 17.4,
6.13, 31, 15.6, 0.90 )
the first of which merely loads the function [we would have had to
place the file ci_2unknown.R into the parent directory (folder)
prior to doing this].
ci_2unknown(11.561, 43, 134.358,
15.284, 52, 144.333, 0.95)
and the console output is shown in Figure 5.
gnrnd4( key1=1481765104, key2=0015301435 )
L2 <- L1
gnrnd4( key1=1840954204, key2=0012801348 )
ci_2unknown( sd(L1), length(L1), mean(L1),
sd(L2), length(L2), mean(L2),
0.95)
Note that we have generated the second list first and then
we copied that list of values to the new variable L2.
We need to do that because gnrnd4() always
overwrites the values in L1.
The result of the four statement sequence in shown in Figure 6.
| Initial Form | Restated version |
|
|
. The
one additional issue is the degrees of freedom for that
Student's t distribution. Again,
we can use the "quick and dirty" by making the
degrees of freedom be one less than the smaller of the two sample sizes.
Alternatively, we can use the higher degrees of freedom
value that is the result of the strange
computation that we outlined above.
s_one <- 3.48
s_two <- 4.12
n_one <- 38
n_two <- 47
x_one <- 25.46
x_two <- 26.91
sig_level <- 0.05
if( n_one < n_two)
{ df_simple <- n_one - 1} else
{ df_simple <- n_two - 1}
t_simple <- qt(sig_level, df_simple)
paste( df_simple, t_simple)
#compute the degrees of freedom
d_one <- s_one^2/n_one
d_two <- s_two^2/n_two
std_err <- sqrt( d_one+d_two )
std_err
df <-(d_one+d_two)^2/
( d_one^2/(n_one-1)+d_two^2/(n_two-1))
t_full <- qt(sig_level,df)
paste(df, t_full )
paste( t_simple*std_err, t_full*std_err)
diff <- x_one - x_two
diff
The console view of those statements is given in Figure 7.
which we know is about 0.82453.
Then we take our two samples and get the sample means.
From those values we compute the difference,
24.46 - 26.91 = -1.45.
Because the null hypothesis has 0 as the value
of the difference of the true means, our question becomes
"What is the probability of getting a difference
of sample means such as we found or a larger difference?"
We "normalize" that difference by dividing it by the
standard error. That produces a t-value,
-1.45/0.8245 ≈ -1.759,
for
a standard normal distribution. As such we can use
a table, a calcualtor, or the computer to
find the probability of getting -1.759
or a value more negative than it.
That probability
depends upon the number of degrees of freedom.
For 37 degrees of freedom we find that
0.0434 is the attained significance.
For 82.823 degrees of freedom we find that
0.0412 is the attained significance.
We were running the test at
the 0.05 level of significance.
Our attained significance, no matter whether it was
0.0435 or 0.0412, is less than the 0.05 so we
reject H0.
# do the same, but with attained significance
s_one <- 3.48
s_two <- 4.12
n_one <- 38
n_two <- 47
x_one <- 25.46
x_two <- 26.91
if( n_one < n_two)
{ df_simple <- n_one - 1} else
{ df_simple <- n_two - 1}
#compute the degrees of freedom
d_one <- s_one^2/n_one
d_two <- s_two^2/n_two
std_err <- sqrt( d_one+d_two )
std_err
df <-(d_one+d_two)^2/
( d_one^2/(n_one-1)+d_two^2/(n_two-1))
diff <- x_one - x_two
t_value <- diff/std_err
paste( diff, t_value )
prob_simple <- pt(t_value, df_simple)
prob_full <- pt(t_value, df )
paste(prob_simple, prob_full )
The console view of those statements is given in Figure 8.
which is about 7.356.
# case 9
s_one <- 37.2
s_two <- 35.1
n_one <- 53
n_two <- 44
x_one <- 276.42
x_two <- 261.13
sig_level <- 0.02
if( n_one < n_two)
{ df_simple <- n_one - 1} else
{ df_simple <- n_two - 1}
t_simple <- qt(sig_level/2, df_simple,
lower.tail=FALSE)
paste( df_simple, t_simple)
#compute the degrees of freedom
d_one <- s_one^2/n_one
d_two <- s_two^2/n_two
std_err <- sqrt( d_one+d_two )
std_err
df <-(d_one+d_two)^2/
( d_one^2/(n_one-1)+d_two^2/(n_two-1))
t_full <- qt(sig_level/2,df,
lower.tail=FALSE)
paste(df, t_full )
paste( t_simple*std_err, t_full*std_err)
diff <- x_one - x_two
diff
The console view of those statements is given in Figure 9.
pt()
function call. The result then needs to be multiplied by 2 because
we are looking at a two-tailed test. We are answering the
question "What is the probability of being that far away
from the mean, in either direction?" We calculated one side
so we need to double it to get the total probability.
The one-side value is 0.0188, so the total will be
0.0376, a value that is larger than our level of significance,
0.02, in the problem statement. Therefore, we do not reject H0
at the 0.02 level of significance.
# do the same, but with attained significance
s_one <- 37.2
s_two <- 35.1
n_one <- 53
n_two <- 44
x_one <- 276.42
x_two <- 261.13
if( n_one < n_two)
{ df_simple <- n_one - 1} else
{ df_simple <- n_two - 1}
#compute the degrees of freedom
d_one <- s_one^2/n_one
d_two <- s_two^2/n_two
std_err <- sqrt( d_one+d_two )
std_err
df <-(d_one+d_two)^2/
( d_one^2/(n_one-1)+d_two^2/(n_two-1))
diff <- x_one - x_two
t_value <- diff/std_err
paste( diff, t_value )
prob_simple <- pt(t_value, df_simple,
lower.tail=FALSE)
prob_full <- pt(t_value, df,
lower.tail=FALSE)
paste(prob_simple, prob_full )
paste(prob_simple*2, prob_full*2 )
The console image of those statements is given in Figure 10.
hypoth_2test_unknown <- function(
s_one, n_one, x_one,
s_two, n_two, x_two,
H1_type=0, sig_level=0.05)
{ # perform a hypothesis test for the difference of
# two means when we do not know the standard deviations of
# the two populations and the alternative hypothesis is
# != if H1_type==0
# < if H1_type < 0
# > if H1_type > 0
# Do the test at sig_level significance.
if( n_one < n_two)
{ df_simple <- n_one - 1} else
{ df_simple <- n_two - 1}
use_sig_level = sig_level;
if( H1_type == 0 )
{ use_sig_level = sig_level/2}
t_simple <- -qt(use_sig_level, df_simple)
#compute the degrees of freedom
d_one <- s_one^2/n_one
d_two <- s_two^2/n_two
std_err <- sqrt( d_one+d_two )
df <-(d_one+d_two)^2/
( d_one^2/(n_one-1)+d_two^2/(n_two-1))
t_full <- -qt(use_sig_level,df)
extreme_simple <- t_simple*std_err
extreme_full <- t_full * std_err
diff <- x_one - x_two
diff_standard <- diff / std_err
decision_simple <- "Reject"
decision_full <- "Reject"
if( H1_type < 0 )
{ crit_low_simple <- - extreme_simple
crit_high_simple <- "n.a."
if( diff > crit_low_simple)
{ decision_simple <- "do not reject"}
crit_low_full <- - extreme_full
crit_high_full <- "n.a."
if( diff > crit_low_full)
{ decision_full <- "do not reject"}
attained_simple <- pt( diff_standard, df_simple)
attained_full <- pt( diff_standard, df )
alt <- "mu_1 < mu_2"
}
else if ( H1_type == 0)
{ crit_low_simple <- - extreme_simple
crit_high_simple <- extreme_simple
if( (crit_low_simple < diff) &
(diff < crit_high_simple) )
{ decision_simple <- "do not reject"}
crit_low_full <- - extreme_full
crit_high_full <- extreme_full
if( (crit_low_full < diff) &
(diff < crit_high_full) )
{ decision_full <- "do not reject"}
if( diff < 0 )
{ attained_simple <- 2*pt(diff_standard, df_simple)
attained_full <- 2*pt(diff_standard, df)
}
else
{ attained_simple <- 2*pt(diff_standard,
df_simple,
lower.tail=FALSE)
attained_full <- 2*pt(diff_standard,
df,
lower.tail=FALSE)
}
alt <- "mu_1 != mu_2"
}
else
{ crit_low_simple <- "n.a."
crit_high_simple <- extreme_simple
if( diff < crit_high_simple)
{ decision_simple <- "do not reject"}
crit_low_full <- "n.a."
crit_high_full <- extreme_full
if( diff < crit_high_full)
{ decision_full <- "do not reject"}
attained_simple <- pt(diff_standard, df_simple,
lower.tail=FALSE)
attained_full <- pt(diff_standard, df,
lower.tail=FALSE)
alt <- "mu_1 > mu_2"
}
result <- c( alt, s_one, n_one, x_one,
s_two, n_two, x_two,
std_err, diff, sig_level,
diff_standard, df,
crit_low_full, crit_high_full,
attained_full, decision_full,
df_simple,
crit_low_simple, crit_high_simple,
attained_simple, decision_simple
)
names(result) <- c("H1:",
"s_one", "n_one","mean_one",
"s_two", "n_two","mean_two",
"std. err.", "difference", "sig level",
"t-value", "full df",
"full low", "full high",
"full Attnd", "full decision",
"simple df",
"simp low", "simp high",
"simp Attnd", "simp decision"
)
return( result)
}
Once this function has been installed we can
accomplish the analysis of Case: 8 above via the command
hypoth_2test_unknown(3.48, 38, 25.46,
4.12, 47, 26.91, -1, 0.05)
Figure 11 shows the console view of that command.
hypoth_2test_unknown(37.2, 53, 276.42,
35.1, 44, 261.13, 0, 0.02)
Figure 12 shows the console view of that command.
gnrnd4( key1=2185232304, key2=0159705826 ) L1 n1 <- length(L1) m1 <- mean(L1) s1 <- sd(L1) gnrnd4( key1=2325842704, key2=0212604672 ) L1 n2 <- length(L1) m2 <- mean(L1) s2 <- sd( L1 )It was not necessary to display the data but it is nice to have the display so that we can confirm that we have correctly entered the gnrnd4() comamnds. The console view follows in Figure 13.
hypoth_2test_unknown(s1, n1, m1,
s2, n2, m2,
1, 0.025)
That command produces the output of Figure 14.