Sampling -- Cluster Sample

Return to the Sampling page

A "Cluster Sample" can appear to be similar to a "Sample of Convenience" in that both methodologies identify select groups within the population from which to select items for the sample. Before we consider this similarity, let us consider the methodology involved in making a cluster sample. Assume that we have products that have been made, boxed, and stacked onto pallets. In this case we will assume that the pallets are made up of 6 layers of boxes with each layer holding 5 rows of 4 boxes. Thus, each pallet holds 120 boxes. In our warehouse we have 200 such pallets. We want a sample of 60 boxes. Were we to do a simple random sample we might have to take our 60 boxes out of 60 different pallets. That means opening each of those pallets and disturbing the contents of all of those pallets. This is a problem since there pallets are ready for shipment and disturbing a pallet is going to require significant expense to reform the pallet and get it ready again for shipment.

Instead of doing this we might randomly choose 3 boxes, determine the pallet and even the layer of that pallet that contains the 3 randomly selected boxes, and then use the 20 boxes of those selected pallet-layers to make up our sample. This technique limits the number of pallets that we will disturb. [In the unlikely event that the same pallet-layer is randomly chosen twice we would just go back and make another random choice of a box and from that get a new pallet-layer to consider.) Since the 3 pallet-layers to be used are randomly chosen it would seem that we have given every box a chance to be chosen. However, unlike the simple random sample, in this new methodology, knowing that a box from pallet number 34 has been chosen dramatically changes the likelihood that the other boxes in pallet 34 will be chosen. In fact, once we know that a box in pallet 34, layer 2 has been chosen then we know that all of the other 19 boxes in that layer have been chosen. Similarly, and maybe in an even more obvious way, if we know that in selecting the first 3 boxes for our sample we have boxes from pallets 34, 97, and 143, then under the system described here we know that none of the remaining 57 boxes in our sample will be from pallets other than pallet 34, 97, and 143. That is we know for certain that we will not have sample items from 197 of the 200 pallets in the warehouse. Thus, the appearance of giving each box an equal chance of being selected even after some boxes have been selected (a feature of simple random samples) is clearly not true for such a cluster sample.

As another example, consider the task of getting a sample of the 11,413 credit students at the community college. A cluster sample would entail randomly selecting five students and for each student randomly selecting one of the course-sections in which they are enrolled. We will then use all of the students in each of those course-sections as our sample.

The similarity of cluster samples and samples of convenience is worth noting. In both of the cluster sample examples given here it would be easy to say that the sample chosen was a convenient way to get items for the sample (i.e., it was convenient to only break apart at most 3 pallets, or it was convenient to only go to five (5) classes to get all of the sample students). However, we identify these examples as cluster samples because the main intent is to randomly select some elements of the sample (the original 3 boxes or the original 5 students) and then to obtain the rest of the sample from items that are "clustered" with those items.

Return to the Sampling page

©Roger M. Palay     Saline, MI 48176     December, 2015