Sampling

We start our look at inferential statistics by introducing the idea of taking a sample of a population. Sampling is a process of selecting a number of hopefully representative values or items from a population of values or items. There are times when it is impossible from a fiscal perspective or from a practical perspective, or from a time perspective for us to examine an entire population. Consider three examples.

First, let us say that we are about to buy a huge shipment of bolts of fabric. Perhaps there are 5000 bolts in the shipment. We want to be sure that items in the shipment, the individual bolts of fabric, meet our strict standards. We simply cannot afford the time or the expense of unrolling each bolt so that we can inspect the the 40 or 100 yards of cloth that are in each bolt. Instead, we decide to get a sample, a small number perhaps 50, of representative bolts from the shipment. We carefully inspect each of those 50. Certainly, if all 50 are flawless, we have no reason to reject the shipment. Now it is certainly possible that the 5000 bolts in the shipment had 4950 bad bolts and only 50 good ones! But it is highly unlikely that we would have just selected the good ones to examine.

Likewise, if we inspect 50 bolts and they are all flawed beyond our standards, then we are going to reject the entire shipment. We tell the seller that we do not have the time or the money to search through the remaining 4950 bolts to see if there are any good ones. The whole shipment goes back and we turn to someone else to supply us with fabric.

The two instances above give two extremes. In inferential statistics we come up with understandings and rules for what we should do when we do not have such extreme cases. What should we do if in the 50 bolts that we do inspect we find just 2 that have some flaws in them? What should we do if we find 5 that are flawed? In the end we are going to base our decision on our examination of the sample of 50 bolts out of the population of 5000 bolts.

As a second example, consider the prognostications that surround elections. A year before the election we start getting statements like "Candidate X has 45% of the vote, candidate Y has 38% of the vote, and 17% are undecided." How do they get these numbers? They certainly do not hold a secret ballot for all potential voters and then just count the votes. Instead, these pollsters select a sample of the population and ask that sample how they would vote. Then based on the results of that survey of the people in the sample, the pollsters come up with their estimate of what the general population would do if the election were held at that moment. In the small print they also tell you that their numbers are at best guesses and that there is an anticipate error of upwards of 4 or 5%.