One Population; Paired Samples

Return to Topics page

There are many situations in which we get values from a sample, have some "intervention" with the sample, and then get new values from that same sample. If we have identified the earlier and later values with particular members of our sample, then we can look at the difference between values for each member of the sample.

For example, we start with a population. From that population we take a sample of members. We take a measurement on each of those members, and we associate that measurement with its particular member. Then we conduct some intervention on the members. That is, we do something to or allow something to possibly change the characteristic that we are measuring. Later, after the intervention, we take a new measurement on the same members of the same sample.

As a particular example, we have a sample of 25 students from the college. We administer a test to these 25 students. The result of the test is a number between 20 and 40. We record the test result of each of the 25 students. Then we show the students a video. After the video, we administer the same test to those 25 students, and we record the new score for each student. Table 1 shows the results of this experiment. For each student the Early measurement is paired with the Later measurement.
We could plot these pairs of values as shown in Figure 1. In that plot the pairs of values are shown above each of the index values and a line connects the paired values.

Figure 1

Our interest is not with the values themselves but rather in the change in values. That is, for each student we want to see the change in the value. That means that we want to find the 25 "Later"-"Early" values. Those values are shown in Table 2.
But now, instead of having two separate lists of values, we have just one list of values. We can use that list as a sample of the difference between scores across the population if we had measured the population early, done the intervention on the entire population, and then, later, measured the population again. That means that we can use the sample values to generate a confidence interval for the difference of the scores across the population and/or we can test the null hypothesis that the difference is zero versus an alternative hypothesis (the difference is not zero, the difference is less than zero, or the difference is greater than zero). But once we have resolved the original two lists into the one list of Table 2 then we are looking at building a confidence interval or testing a hypothesis based on that one list. That is a task that we solved earlier.

In our example we can generate the values, find the difference, and then create a confidence interval (we will use a 95% confidence level) via the following R commands.
source("../gnrnd4.R")
gnrnd4( key1=1293112410, key2=330003200298 )
L1
L2
L3 <- L2-L1
L3
source("../ci_unknown.R")
ci_unknown(s=sd(L3), n=length(L3),
           x_bar=mean(L3),
           cl=0.95)
The console version of those commands is shown in Figure 2.

Figure 2

From Figure 2 we see that the 95% confidence interval for the difference of the scores is (-0.018,1.106).

Having defined L3 we could test H0: d=0 against the alternative hypothesis H1: d≠0 at the 0.05 level of significance via the commands
source("../hypo_unknown.R")
hypoth_test_unknown(
  0,  0, sig_level=0.05,
  length(L3), mean(L3), sd(L3))
Figure 3 shows the console view of those commands.

Figure 3

From Figure 3 we see that the mean of the differences was 0.544, a value that is between the two critical values, -0.562 and 0.562 causing us to not reject the null hypothesis via the critical value approach. Or we see that the attained significance with 24 degrees of freedom is only 0.0572, which is not less than the 0.05 level of significance set in the statement, so we do not reject the null hypothesis via the attained significance approach.

You may have noticed that this topic is presented in the middle of a sequence of topics dealing with samples from two populations. Here, however, we end up dealing with one list of data values, the differences between paired sample values. To do this, once we had the problem resolved to our one list of values, we just returned to the concepts and commands that we had developed for dealing with one population samples in order to get a confidence interval and/or a test on the null hypothesis that the differences are zero. Why, then, is this topic placed here?

One reason to do it here is that we start with what seems to be two samples. However, those values are paired, and our interest is in the change in the values within each pair, not in the values themselves. Another reason is that we generally look to see if that change is different from zero, in the same way that we have been looking, in the previous two topics, to see if the difference in those means is zero. Therefore, it is almost a "tradition" to place this topic within the discussion of two sample topics.


Let us look at a second example before we leave the topic. Here we have 32 pairs of samples. We have a "before" and "after" score for each member of the sample. We want a 98% confidence interval for the difference in the scores, where we understand that difference is the growth of the values from "before" to "after". In addition, we want to test, at the 0.02 level of significance, the hypothesis H0: d = 0 against the alternative H1: d > 0. You might note that the confidence interval will spread the 2% across both ends of the interval whereas the test will be a 1-sided test with the 2% just at the top.

The pairs of sample values are given in Table 3.
To generate and confirm these samples we will use the commands
gnrnd4( key1=572193110, key2=140002800163 )
L1
L2
The console result of those commands is in Figure 4.

Figure 4

We want to resolve these to one list of differences within the pairs. To show the "growth" in values we want to find "After"-"Before". The commands
L3 <- L2-L1
L3
do this. Their console view is in Figure 5.

Figure 5

To find the confidence interval for the values now in L3 we could find the length of L3 (which we know to be 32), the mean of L3, and the sample standard deviation of L3, along with the t-value for 31 degrees of freedom that has half of the area outside of the 98% confidence interval above that value, and then we could compute the confidence interval. Alternatively, we can have R do all of this with the command
ci_unknown(s=sd(L3), n=length(L3),
           x_bar=mean(L3),
           cl=0.98)
which produces Figure 6.

Figure 6

From Figure 6 we get the 98% confidence interval (-0.213,3.213).

To run the hypothesis test we continue with the command
hypoth_test_unknown(
  0,  1, sig_level=0.02,
  length(L3), mean(L3), sd(L3))
which gives the result shown in Figure 7.

Figure 7

from which we get the one, high, critical value, 1.497, and the sample mean, 1.5, resulting in a decision to Reject H0 in favor of H1. And we get the attained significance, 0.0198, which, since it is less than our 0.02 level of significance, results in the decision to Reject H0 in favor of H1.

Finally, just so that we can see the changes from L1 to L2 we can use the statements
plot(1:32,L1, col="darkgreen",
     xlim=c(0,40), xaxp=c(0,40,10),
     xlab="index values",
     ylim=c(110,230), yaxp=c(110,230,12),
     ylab="item values",pch=22,
     main="Pairs of Values from Table 3"
)

points(1:32,L2, col="darkred", pch=20)
for(i in 1:32)
{ lines(c(i,i),c(L1[i],L2[i]))}
abline(h=seq(110,230,10),lty=3,col="darkgray")
abline(v=seq(0,40,4),lty=3,col="darkgray")
legend("topright", 
       legend = c("Before","After"),
       pch=c(22,20),
       col=c("darkgreen","darkred"),
       inset=0.03)
to generate the plot shown in Figure 8.

Figure 8

Figure 8 shows us that sometimes the scores go up, as in item 6, , sometimes the scores go down, as in item 17, , and sometimes the scores do not change much at all, as in item 22, . However, overall, there are more ups and further ups than downs. With the mean of L3 being 1.5, we see that the average increase in the sample was 1.5. Furthermore, there is enough of an average increase, given the sample size and the sample standard deviation, for us to say, at the 0.02 level of significance, that we have enough evidence to reject the idea that the intervention, if applied to the population, would not change the scores in favor of the alternative that the average scores would go up.

Return to Topics page
©Roger M. Palay     Saline, MI 48176     February, 2016