Sampling -- Cluster Sample
Return to the Sampling page
A "Cluster Sample" can appear to be similar to a "Sample of Convenience" in that both
methodologies identify select groups within the population from which to
select items for the sample. Before we consider this similarity, let us
consider the methodology involved in making a cluster sample. Assume that we have products
that have been made, boxed, and stacked onto pallets. In this case we will assume that
the pallets are made up of 6 layers of boxes with each layer holding 5 rows of 4 boxes.
Thus, each pallet holds 120 boxes. In our warehouse we have 200 such pallets.
We want a sample of 60 boxes. Were we to do a simple random sample we might have to
take our 60 boxes out of 60 different pallets.
That means opening each of those pallets and disturbing
the contents of all of those pallets.
This is a problem since there pallets are ready for shipment and disturbing
a pallet is going to require significant expense to reform the pallet and
get it ready again for shipment.
Instead of doing this we might
randomly choose 3 boxes, determine the pallet and even the layer
of that pallet that contains the 3 randomly selected boxes,
and then use the 20 boxes of those selected pallet-layers
to make up our sample.
This technique limits the number of pallets that we will disturb. [In the unlikely
event that the same pallet-layer is randomly chosen twice we would just go back and make
another random choice of a box and from that get a new pallet-layer to consider.)
Since the 3 pallet-layers to be used are randomly chosen it would seem that we have
given every box a chance to be chosen.
However, unlike the simple random sample, in this new methodology, knowing that a
box from pallet number 34 has been chosen dramatically changes the likelihood that the other
boxes in pallet 34 will be chosen. In fact, once we know that a box in
pallet 34, layer 2 has been chosen then we know that all of the other 19 boxes
in that layer have been chosen.
Similarly, and maybe in an even more obvious way, if we know that
in selecting the first 3 boxes for our sample we have boxes from pallets 34, 97, and 143, then
under the system described here we know that none of the remaining
57 boxes in our sample will be from pallets other than pallet 34, 97, and 143.
That is we know for certain that we will not have sample items from 197 of the 200 pallets in the
warehouse. Thus, the appearance of giving each box an equal chance of being
selected even after some boxes have been selected (a feature of simple random samples)
is clearly not true for such a cluster sample.
As another example, consider the task of getting a sample
of the 11,413 credit students at the community college.
A cluster sample would entail
randomly selecting five students and for each student
randomly selecting one of the course-sections
in which they are enrolled.
We will then use all of the students in each of those course-sections
as our sample.
The similarity of cluster samples and samples of convenience is worth noting.
In both of the cluster sample examples given here it would be easy to say that
the sample chosen was a convenient way to get items for the sample (i.e., it was convenient
to only break apart at most 3 pallets, or it was convenient to only go to
five (5) classes to get all of the sample students). However, we identify these
examples as cluster samples because the main intent is to randomly
select some elements of the
sample (the original 3 boxes or the original 5 students)
and then to obtain the rest of the sample from items that are "clustered" with
those items.
Return to the Sampling page
©Roger M. Palay
Saline, MI 48176 December, 2015