When we refer to some collection of things as a population we are
stating that such a collection spans the entirety of our interest.
For example, we might look at the students enrolled in this section of this class as a population.
If we do that, then we are stating that our interest is specific
to the students enrolled in this section of this class. On the other hand,
we might be interested in all of the students registered for this class across all sections for this term.
In that case, the population would be all students registered for this class this term. In
particular,
in that case, the students enrolled in just this section would no longer be a
population.
As you can see, the things in a population depend upon how we define that population.
At one time a particular collection of things can be a population and at another
time that same collection of things will not be the population.
Populations can be relatively small as was the case of all the students enrolled in
Section 05 of
Math 160 at Washtenaw Community College at the end of the Fall 2015 semester.
That would be a population of less than 30 people.
Populations can be somewhat larger as in the case of all students enrolled in
any section of Math 160 at WCC at the end of the Fall 2015 semester. That would be a population
of just over 400 students.
Populations can be considerably larger as in registered voters in Washtenaw County
as of September 1, 2015. And populations can be huge, as in the timestamped list of
different clicks on the Amazon.com website from all users during the calendar year 2014.
The size of a population is not the thing that makes it a population.
Rather, something is a population if we say it is the entire collection of
things in which we are interested in our analysis.
A sample is a portion of a population. If our population is the collection
of all students currently registered in our section of this class, then the
first five students on the class roster are a sample of that population.
Of course, the
last seven students on the class list are a different sample of that same population.
In fact, the collection of all of the female student in the class is a sample of that
same population. Any portion of the students in the class (other than the empty set and the entire class)
is a sample of the class.
Samples can be small or large in relation to the size of the population.
A sample of the first ten students on the class list for our class
represents a good portion of the size of the class.
Those same ten students would be a sample of all students registered for Math 160 this semester at WCC.
However, in that case the sample would be a small portion of the size of that population.
Those same ten students would be a sample of all registered credit students at WCC this term,
but now that sample is a miniscule portion of the entire population.
We will have to be concerned about the size of a sample, and to some extent,
about the relative size of the sample to the population.
Once we have identified a population or a sample we want
to look at some measures (or counts) of that data.
We might look at a number of different measures, as in age, height, weight, hair color,
number of years of schooling, and so forth.
Although we may look at many different measures, we make a distinction between cases
where we look at a measure without looking at other measures and those cases where
we look at the interaction of two or more measures.
In the former case we talk about one variable statistics, or as it
has been popularized via the calculators, 1-var stats.
Thus, in our examples, we might look at age and just consider
various ways to describe the age of "things" in the data population
or sample. We could do the same thing with the variable height, and it would
still be a case of one variable statistics.
However, if we look at height and weight together, that is we consider the
interaction of the two variables, then we are looking at two variable statistics.
Our raw material in this chapter is the measurements that are
taken to create our set of data values. We might measure the height of people,
the weight of cars, the temperature on a roof, the age of pets, the number of
cars in different aisles in the parking lot, the number of vehicles from each
manufacturer in each aisle in the parking lot, or the opinions of students to questions
on a course evaluation survey.
All of these are measures, but they are not all the same kind
of measure.
There are a number of different ways to divide our view of data.
In some texts and courses the
data is classified as continuous or discrete.
In other texts and courses data is classified as categorical, sometimes called qualitative,
or quantitative.
For us we will look at four kinds of measurements: nominal, ordinal,
interval, and ratio.
After looking at these four we will
return to the other conventions and see how they fit with our four
classifications.
You may want to look at this "random" data for a class of students.
Nominal measurements assign numbers merely as names.
We do this all the time.
For example, we could open a bag of M&M's^{®}and
inspect each piece of candy in it.
For each piece we could assign a number depending upon the color of the piece.
We could assign 1 for a red, 2 for a green, 3 for a yellow,
4 for blue, 5 for a light brown,
6 for a dark brown, and 7 for orange.
By doing this we would get a data set that might look like {1, 1, 2, 1, 6, 6, 6, 7, 4, 1, 4, 3, 2, 3, 6, 2, 1, 6, 6, 6, 6, 7,
6, 6, 7, 2, 1, 6, 6, 7, 6, 4, 4, 2, 1, 6, 2, 4, 7, 6, 7, 3, 6, 6, 6, 6,
3, 6, 3, 7, 1, 6, 6, 2, 6}
In fact, that is exactly the data set derived by examining the pieces taken from a real 47.9g bag of
M&M's^{®} purchased on August 27, 1999.
The nominal measurements (i.e., the number
names) for the pieces were recorded as pieces were poured out of the bag,
one at a time.
Note that the numerals are being used simply as names for the different colors
of the candy pieces.
There is no particular importance attached to the numeric names.
A red candy, given
the name 1, is not better or worse than is a green candy, given the name 2.
The numeric names are not measuring anything. They are just numeric names, assigned to represent the different
colors of candy.
Looking at the manufacturer of vehicles in an aisle of the
parking lot would give us another example of
a nominal measurement.
We could use 1 for any GM vehicle, 2 for any Ford, 3 for any Chrysler,
4 for any Toyota, 5 for any Honda, and so on.
We would need to have a list of manufacturers and we
would need to assign a numeric name to each item on the list.
Then we could move down an aisle
of the parking lot and simply record the appropriate numeric
name for any particular vehicle that
we find.
As before, having a numeric value higher or lower
than the numeric value of a different vehicle
says nothing about the vehicles other than that they are from different manufacturers.
We fill out school applications, registration forms,
credit applications, and other
questionnaires that ask us to indicate our gender, male or female.
It is common to code our
responses using a number, perhaps 1 for female and 2 for male.
There is no inherent meaning to the numbers that we use.
The makers of the survey or form
could just as easily have
assigned 75 to female and 28 to male,
or 1 for male and 2 for female.
There is no implied order
to a nominal measurement, no implied value.
As a final example of a nominal measurement,
consider your social security number. It
is recorded on any number of different computer systems.
It is used as a means to identify
different people.
It is a government assigned name for those people.
Ordinal measurement uses the order
of our numbers to reflect the natural order of the
items we are measuring.
Opinion scales are among the most common ordinal measurements.
Surveys measure our opinions by giving us statements
and then asking us to rate our
answer as
Strongly Disagree
Disagree
Neutral or no opinion
Agree
Strongly Agree
After we have filled out the survey, our responses may be coded as
Strongly Disagree
Disagree
Neutral or no opinion
Agree
Strongly Agree
In doing this we have created an ordinal measurement. The higher
numbers correspond to higher levels of agreement with the original statement.
As another example of an ordinal measurement,
we could ask people to rate
the quality of motion pictures on a scale from 1 to 10
where 1 is the worst and 10 is the best.
The order of various rankings would reflect the order
of opinions on the quality of the movies.
We understand that a ranking of 4 indicates a higher
quality than does a ranking of 3.
Note, however, that there is no implication that the
change from 3 to 4 is the same as is the change
from 4 to 5. All we know is that for the system we are
using, a movie rated as 4 is believed to
be of a lesser quality than is a movie that is rated as 5.
Furthermore, there is no
"yardstick" that is being used to determine these rankings.
A movie rated as an 8 by one
person does not necessarily mean the same as does
an 8 from another person. All that we do know
is that if both people rate one move lower than
another then the people agree on the relative order
of the quality of the movies.
We need to emphasize that point. Let us consider an
example where George rates one movie as an 8 and another as a
7, while Betty rates the same movies as 8 and 3, respectively.
Assuming that these ratings are all that we know about George and Betty, then all
that we can really say is that George and
Betty agree on the relative order of the qualities of
those two movies. The fact that they both rate
the first movie as an 8 tells us nothing.
George's 8 does not indicate in any way an agreement with Betty's 8.
The fact that
that George rates the two movies close together (8 and 7) while Betty rates them further apart (8 and 3)
tells us nothing. The spread in George's ratings (from 8 to 7) does not in any way indicate that George
really considers the two movies to be of a more similar quality than does Betty, even though Betty has a wider range
in her ratings. These are two different people and we have no reason to believe that they
are using the same internal measurement system to determine their opinion on the quality of the two movies.
Nonetheless, we do know that they agree that they view the first movie as being of higher quality than is the second movie.
It is tempting to read more into such ordinal measures than we should.
It is tempting to say that George and Betty agree on the quality of the first movie and that they
disagree on the quality of the second.
It is tempting to say that George thinks that
the movies are close in quality while Betty sees the two movies as being far different in quality.
Those temptations are not justified.
Unfortunately, in far too many real-life cases, the temptation
wins out and people jump to such unfounded conclusions.
Interval and Ratio measurements
are similar to each other. They both refer to a
measurement for which there is a consistent "yardstick",
where two people can derive the same measurement of an item by using that
"yardstick".
Temperature measurement in Fahrenheit is good example of an interval measurement.
Not only is 86°F warmer than is 43°F, but it is also the case that
86°F is always the same temperature, no matter who reads the thermometer
(assuming that they read it correctly).
The thermometer acts as the standard "yardstick"
for this measurement.
Weight in pounds is an example of a ratio measurement.
Something that weighs 8 pounds is heavier than is something that weighs 4 pounds.
8 pounds of sugar may be sweeter than is 8 pounds of salt,
but 8 pounds of sugar is no more or less heavy
than is 8 pounds of salt, or 8 pounds of iron, or 8 pounds of anything.
A scale provides the standard "yardstick" for measuring weight.
We have seen an example of an interval measurement (temperature in Fahrenheit)
and of a ratio measurement (weight in pounds). The two examples seem
quite similar. What then is the difference between an interval and a ratio
measurement?
Note that it makes sense to say that something that
weighs 8 pounds is twice as heavy as is
something that weighs 4 pounds.
In fact, 8 pounds of sugar weighs just as much as does
two 4-pound packages of sand. However, we would not make the same kind of statement
about temperature. 86°F is not twice as warm as is 43°F. Two 43°F days are
not the same as one 86°F day.
Furthermore, a weight of 0 pounds means no weight,
that something is weightless,
whereas 0°F does not mean no temperature, or no heat energy
(which is what temperature measures).
Ratio measurements have a true zero value, interval
measurements do not.
0°F is not true zero. It does not signify the absence of heat energy.
Temperature
measured in Celsius is also an interval measurement.
0°C is not true zero, and 20°C
is not half as warm as is 40°C.
It is important to note that there is a ratio measurement
of temperature, namely the Kelvin scale.
0°K represents absolute zero, the total lack of heat energy.
And something that is at 100°K has half the heat
energy as does something at 200°K.
Interval and ratio measurements have a degree of accuracy
that needs to be examined.
Your body temperature is probably quite close to 98.6°F.
However, it is most certainly not
exactly 98.6°F.
Even if a digital thermometer were to give a 98.6°F reading, we
need to recognize that every true temperature value
from 98.55°F through 98.64999°F
will show up as 98.6°F on a thermometer
that is only accurate to the nearest tenth of a degree.
Height is a ratio measurement. How tall are you?
Whatever answer you gave, it is certainly
wrong. Your answer may well be true to the nearest inch,
or even to the nearest quarter inch,
but your answer is certainly not true to the nearest thousandth of an inch.
Interval and ratio measurements tend to be taken on
things that have a continuous nature.
Generally, we can always get more accurate interval
and ratio measures if we just use a
more accurate "yardstick".
However, in a practical sense, we always settle on some approximation
to the true value, and we round off our answers to some degree of accuracy. It is
interesting to note that this does not happen with nominal and ordinal measurements.
Nominal and ordinal measures tend to be discrete, meaning that they
correspond to our "counting numbers".
The distinction between discrete and continuous
data is clouded by reality. Whereas we would consider height measurements of people
given to the nearest foot (i.e., 3, 4, 5, 6, or 7 feet tall) to be discrete
we would probably consider height measurements of people to the nearest millimeter
to be continuous. However, both are just rounded values. In reality,
when we are processing measurements, we are almost certainly processing
either discrete measures or rounded off measures of continuous values.
We will see this distinction again in our later discussions.
The other classification of data was
categorical, sometimes called qualitative
or quantitative.
Categorical corresponds to what we have identified as nominal and ordinal
values. The alternative name, qualitative implies the non-measurable nature of these values.
Of course, that leaves quantitative values to correspond to things that we can measure, namely,
interval and ratio measures.