Descriptive Statistics: Population vs. Sample
Return to Topics page
In the arrangement of web pages for this course,
this page presents no new material.
It is provided here
because after we start seriously talking about
samples versus populations,
it make sense to review how we name and apply
certain descriptive measures.
We distinguish between population and sample characteristics
by referring to population characteristics as population parameters
and sample characteristics as sample statistics.
Thus, the mean of a population is a parameter of that
population, but the mean of a sample is a statistic
of that sample.
Beyond that distinction
there is no difference in the naming or computing
for the
measures mode, the median,
the range, and the quartiles.
The mean of a population, μ, has
a different symbol from the one used for the mean of a sample,
,
however the computation of each is the same.
Standard deviation has both
different symbols, σ for a population
and sx for a sample, and
slightly a slightly different formula.
These differences are presented in the following table.
As an example, we start with the data in Table 2:
Figure 1 holds the console image of running the commands:
gnrnd4( key1=740587104, key2=0002300357 )
L1
mean(L1)
summary(L1)
Figure 1
The values in Table 1, and now in L1 in
our R session, could be a population or they could be
a sample. In either case, the command mean(L1)
displays the value of the mean of those
values. If Table 1 represents a
population then we would say
μ = 356.375 but if those
value represent a sample then we would say
= 356.375.
Figure 1 continues with the summary(L1)command.
The result also displays the mean
but with fewer significant digits. Of course,
the other values in that display have the
same meaning independent of the values
being a population or a sample.
To find the standard deviation of the values in Table 1
we use the sd(L1) command as shown in Figure 2.
Figure 2
The result, 23.0067, assumes that the values in Table 1
are from a sample. There is no way to
tell the sd() command that you want the
values to be considered a population.
If we do want the standard deviation for a population
we can just multiply the sd() result by sqrt((N-1)/N)
where N is the number of values in the table.
Table 1 has 72 values in it so we use the
command sd(L1)*sqrt(71/72)
to compute
the standard deviation of the values assuming they
are from a population.
The result is shown in Figure 3
Figure 3
You may recall that in our earlier page discussing this issue we
developed a function to do our work for us.
The code of that function was
pop_sd<-function( input_list )
{ n <- length( input_list)
sd( input_list )*sqrt((n-1)/n)
}
Figure 4 shows the definition of the function in our R
session.
Figure 4
Once the function is defined we can use it to compute the
population standard deviation by just calling the function as in
pop_sd(L1). Figure 5 shows the use of that command.
Figure 5
Return to Topics page
©Roger M. Palay
Saline, MI 48176 November, 2015