Sampling
Return to Topics page
We start our look at inferential statistics
by introducing the idea of taking a sample of
a population.
Sampling is a process of selecting a number
of hopefully representative values or items from a
population of values or items.
There are times when it is impossible from a fiscal perspective
or from a practical perspective, or from a time perspective
for us to examine an entire population.
Consider three examples.
First, let us say that we are about to buy a huge shipment of
bolts of fabric. Perhaps there are 5000 bolts in the
shipment. We want to be sure that items in the
shipment, the individual bolts of fabric, meet our strict standards.
We simply cannot afford the time or the expense of
unrolling each bolt so that we can inspect the the 40 or 100
yards of cloth that are in each bolt. Instead, we decide to
get a sample, a small number perhaps 50,
of representative bolts from the shipment.
We carefully inspect each of those 50.
Certainly, if all 50 are flawless, we have no reason to
reject the shipment. Now it is certainly possible that
the 5000 bolts in the shipment had 4950 bad bolts and only
50 good ones! But it is highly unlikely that we
would have just selected the good ones to examine.
Likewise, if we inspect 50 bolts and they are all flawed beyond
our standards, then we are going to reject the entire shipment.
We tell the seller that we do not have the time or the money to
search through the remaining 4950 bolts to see if there are any good ones.
The whole shipment goes back and we turn to someone else to supply
us with fabric.
The two instances above give two extremes. In inferential statistics
we come up with understandings and rules for what we should
do when we do not have such extreme cases. What should we do if in the
50 bolts that we do inspect we find just 2 that have some flaws in them?
What should we do if we find 5 that are flawed?
In the end we are going to base our decision on
our examination of the sample of 50 bolts out of the
population of 5000 bolts.
As a second example, consider the prognostications that
surround elections. A year before the election we start getting
statements like "Candidate X has 45% of the vote, candidate Y has
38% of the vote, and 17% are undecided." How do they get these
numbers? They certainly do not hold a secret ballot
for all potential voters and then just count the votes.
Instead, these pollsters select a sample of the population
and ask that sample how they would vote. Then based on the
results of that survey of the people in the sample,
the pollsters come up with their
estimate of what the general population
would do if the election
were held at that moment. In the small print they also
tell you that their numbers are at best guesses and that there is an
anticipate error of upwards of 4 or 5%.
While thinking of a third example I was
driving around southeastern Michigan.
I noticed that the non-personalized,
standard issue, Michigan blue and white automobile license plates
seem to always start with A, B, C, or D (at least as of December, 2015). An example of such a
plate would be DHP 4507. [In fact that is probably someone's
plate, but we really do not care who that someone is.]
However, for whatever reason, I am currious to know what
proportion of these standard plates start with A, what proportion
start with B, and so on.
I interest you in this questions and we decide to
try to find the answer.
Now, neither you nor I are going to
go around and systematically find and record the license plate
of every car in Michigan. That is just too big of a task.
However, we could get a sample of cars, maybe all of the cars in
the WCC parking lots as we and a few of our friends drive through them,
and we could easily get a count of the standard blue and white
Michigan car plates that start with each of the letters A through D.
Then, based on the results of that sample, we could make at
least a good guess at the relative proportion of each type of
plate in the population.
Given that there are less than 4000 parking spots
on campus, we and 9 of our friends could easily
gather the required data in just an hour or so.
Again, we would be using the results of looking at a sample
in order to make a good estimate of the situation in the
population.
One immediate issue to consider in this last example is whether
or not the distribution of initial letters on license plates
has anything to do with the registered location of the vehicle.
It is possible, after all, that initial letters on plates
allocated to the Grand Rapids
area are different from those allocated to the Ann Arbor area.
To test this we could call one of our friends
in Grand Rapids and ask that friend
to gather similar information from some large parking area around
Grand Rapids. Then we could compare the results any
see if it looks like there is any area related difference.
Again we are using sample data to make statements
about the overall population.
What we see is that samples can be really helpful
in learning about a population without having to
test, inspect, or use all of the elements of that population.
What remains to be considered is how do we get a representative
sample.
We will do this within the following areas:
- Simple random sample
- Sample of convenience
- Stratified sample
- Cluster sample
- Systematic sample
- Voluntary Response sample
Return to Topics page
©Roger M. Palay
Saline, MI 48176 December, 2015