Sampling -- Stratified Sample

Return to the Sampling page
A "Stratified Sample" is a sample that is purposefully constructed to insure the inclusion of items from identified "slices" or "partitions" of the population. For example, consider the situation where we have a manufacturing plant that produces a particular part. Over the years the demand for this part has increased. In order to meet that demand, over time we have acquired new machines to make this part. At the moment we have five (5) different machines all making the same part. We want to get a sample, of size 60, of the products produced in one week. All of the output from the week is stored in our warehouse. We could get a "simple random sample" by numbering all of those products and then selecting 60 random numbers from 1 to the number of products. However, if we did this there is no certainty that our sample will contain products from each of our five machines. We could generate a "stratified sample" by deciding to choose 12 random items from the products produced by each of the five machines. By doing this we are certain that we have representative sample products produced each of the machines that we are using.

This approach is not the same as a "simple random sample" because in an SRS there is indeed the possibility that our sample will not have any items that come from one particular machine. In fact, it would be possible, though not likely, for the SRS to have all 60 items in the sample come from one machine.

Our "stratified sample" insures that the sample will contain items from each machine. This has a great appeal to us. As long as our selection of items within those produced by each machine is truly random, this is not a terrible approach. It is open to some new problems however. For example, in our situation, we chose to take 12 items from the products produced by each of the five machines. The balance of sample sizes across machines implies that the output of the five machines is equal. If, on the other hand, one of our machines made half the products while another made a fifth of the products, and the last three machines each made a tenth of the products, then we should sample in the same proportion, taking a sample of size 30 from the first machine's output, a sample of size 12 from the output of the second machine, and samples of size 6 from the output of each of the remaining machines.

As a second example, consider getting a sample of the 11,413 credit students at the community college in the fall term. We could select the students at random, but we are concerned because we want to be sure that we have some older, age 45 and above, students in our sample. With a simple random sample we may have older students in the sample but we cannot be sure that we will. We could adopt a stratified sample methodology to be sure that we have the distribution that we want in the sample. We happen to know that 90% of the credit students at the college are younger than 45. Therefore, as a stratified sample, to get a total sample of size 60 we force our choice to have 54 students randomly selected from students who are younger than 45, and 6 students randomly selected from the students who are 45 years old or older.

Return to the Sampling page

©Roger M. Palay Saline, MI 48176 December, 2015