## Hypotheses

I have been collecting pennies for decades. I have a 55 gallon barrel full of those pennies. Because I use the barrel for exercise and during that exercise I roll and even flip the barrel I know that the pennies are thoroughly mixed up. I have a coin collector who wants to buy my collection, but only if the average of the years on the pennies is less than 1968. I am sure that such is the case. However, I want to be really sure. It would take too much time and effort to go through and analyze the entire collection. Therefore, I take a random sample of 32 pennies and I look at the years on those coins. Table 1 shows the sample years.
I calculate the sample mean year. It turns out to be 1961.906; I feel great! The prospective buyer, when told of this, is not so thrilled. She says, "That was just one sample. Maybe you got lucky! How do I know if a sample mean year of 1961.906 is far enough below our agreement on 1968 to prove that the average year on all of those pennies is less than 1968?"

Having seen, as we did in earlier pages, that the sample mean, while it tends to be close to the population mean, can be somewhat different from that value, how low does it have to be to "prove" that the population mean must be less than 1968?

To do this we look at the possibilities and we formulate a hypothesis. We will look at three possibilities about the average year of coins in the population. The average can be less than 1968, it can be equal to 1968, or it can be greater than 1968. Admittedly, the middle choice, equal to 1968, is almost certainly not true just because it would be such a coincidence to have the sum of all the years be exactly 1968 times the number of coins in the population.

Still, that middle choice is much more helpful than is either other choice. The middle choice gives us a solid figure, namely, 1968. The other choices merely say that the true value is not 1968; they do not give us a specific value that we can use. We formulate two hypotheses, H0, and H1. We call H0 the null hypothesis. We call H1 the alternative hypothesis. H0 will represent the only situation where we actually know a specific value. In this case we would say that the null hypothesis is that the population mean year is 1968. We write this as H0: μ=1968. The alternative hypothesis is the situation we would like to test. For us, the alternative hypothesis is that the population mean year is less than 1968. We write this as H1: μ<1968.

We have our sample. We have calculated the sample mean year. Based on that sample mean, , we either reject H0 in favor of H1, or we do not reject H0 in favor of H1. Certainly, if our sample had an average year of 1975 we might think that H0 is wrong, but we would not reject H0 in favor of H1, an alternative that suggests that the mean is less than 1968. Because we only have two alternatives, H0 and H1, the fact that H0 is expressed using equality, H0: μ=1968, does not mean that we have to hit 1968 exactly to favor H0 over H1. Were we to get a sample mean year of 1967.78 then intuitively,we would not want to suggest that the null hypothesis is wrong. On the other hand, were we to get a sample mean year of 1923 then we would certainly reject H0 in favor of H1 because 1923, as an average of 32 coin years, is so extremely far from 1968. But what do we do with an average of 1961.908?

We can sum up the situation by looking at Table 2.

 Table 2 The Truth (reality) Our Action H0 is TRUE H0 is FALSE Reject H0 This is a Type I error made the correct decision Do not Reject H0 made the correct decision This is a Type II error

A Type I error occurs when we reject H0 when it is really true. In our case, it is possible that the true mean of the year on the coins in the barrel is a value just over 1968 and yet, just by bad luck, we get a sample that has a sample mean so low (we still do not know just how low that would have to be) that we reject H0 and claim that the average of years of the coins is less than 1968. This would be a mistake!

A Type II error occurs when we do not reject H0 when it is really false. In our case, it is possible that the true mean of the years on the coins in the barrel is a value less than 1968 and our sample mean year is still too close, or even above, 1968 so we do not reject H0.

Thus we have two kinds of errors, Type I and Type II. The bad news is that you cannot force both of them to be small. The more you try not to make a Type I error the more likely it is that you will make a Type II error. We will see this in action when we get to actual hypothesis testing in a subsequent web page.

For now, just looking at our example, there is not enough information for us to come out with a good rule on when to reject or not reject H0 in favor of H1. We have no information on just how "strange it would be", if H0 is true, for us to take a random sample of size 32 and get a sample mean of 1961.908.

Consider two cases. In the first case we happen to know that the standard deviation of the coin years in the barrel is 7 years. Then, with a sample of size 32, we know that the distribution of the sample means is approximately normal with a standard deviation of about 7/sqrt(32) or about 1.24. That would make 1961.908 nearly 5 standard deviations below the mean. Since the sample means have a normal distribution, that is an absurdly low value. It could happen, but, as they say, the odds of it happening are next to nil. We would certainly reject H0 in that case. You might want to note that having a sample mean of 1961.908 would not prove anything. It is just ridiculously suggestive that the true mean cannot be 1968 or higher.

On the other hand, if the standard deviation of the coin years in the population is 45 years, then the standard deviation of sample means for samples of size 32 is 45/sqrt(32) or approximately 7.95. That would make 1961.908 not even 1 standard deviation below the mean. Getting a mean that low, or lower, is not strange at all. It does not happen a majority of the time, but it not strange enough for us to reject the null hypothesis. Therefore, in that situation, we do not reject H0.

Finally, let us look at the case where the standard deviation of the coin years in the population is 21. Then the standard deviation of sample means for samples of size 32 is 21/sqrt(32) or approximately 3.712. That would put our 1961.908 at 1.64 standard deviations below the mean in a normal distribution. Assuming that H0 is TRUE, if we took repeated samples of size 32, about 5% of them would be that far or further below the mean. If we reject H0 in this case then we are taking a 5% chance of being wrong, that is, of making a Type I error.

If we are not willing to make a Type I error 5% of the time then we could choose a cutoff value even lower than 1961.908. That is, one that is more standard deviations below the supposed mean of 1968. Perhaps we should look at 1959, a year that is about 2.425 standard deviations (σ=3.712) below 1968. We would reject the null hypothesis if the mean of the 32 item sample is less than 1959. If H0 is TRUE then getting a sample mean at 1959 or lower would happen, by chance, less than 1% of the time. We would make a Type I error less than 1% of the time. That change to a 1959 cutoff value decreased our chance of making a Type I error, but it also increased the likelihood of making a Type II error.

The reason that we cannot get a good measure of Type II errors in cases like this one is that in the case of a Type II error all we know is that H0 is FALSE. We have no idea just what the true value of the mean is, we just know that it really is not 1968. Because we have no known correct value we cannot compute the likelihood of getting the result that we did.