Data

Return to Topics

Population

When we refer to some collection of things as a population we are stating that such a collection spans the entirety of our interest. For example, we might look at the students enrolled in this section of this class as a population. If we do that, then we are stating that our interest is specific to the students enrolled in this section of this class. On the other hand, we might be interested in all of the students registered for this class across all sections for this term. In that case, the population would be all students registered for this class this term. In particular, in that case, the students enrolled in just this section would no longer be a population.

As you can see, the things in a population depend upon how we define that population. At one time a particular collection of things can be a population and at another time that same collection of things will not be the population.

Populations can be relatively small as was the case of all the students enrolled in Section 05 of Math 160 at Washtenaw Community College at the end of the Fall 2015 semester. That would be a population of less than 30 people. Populations can be somewhat larger as in the case of all students enrolled in any section of Math 160 at WCC at the end of the Fall 2015 semester. That would be a population of just over 400 students. Populations can be considerably larger as in registered voters in Washtenaw County as of September 1, 2015. And populations can be huge, as in the timestamped list of different clicks on the Amazon.com website from all users during the calendar year 2014. The size of a population is not the thing that makes it a population. Rather, something is a population if we say it is the entire collection of things in which we are interested in our analysis.

Sample

A sample is a portion of a population. If our population is the collection of all students currently registered in our section of this class, then the first five students on the class roster are a sample of that population. Of course, the last seven students on the class list are a different sample of that same population. In fact, the collection of all of the female student in the class is a sample of that same population. Any portion of the students in the class (other than the empty set and the entire class) is a sample of the class.

Samples can be small or large in relation to the size of the population. A sample of the first ten students on the class list for our class represents a good portion of the size of the class. Those same ten students would be a sample of all students registered for Math 160 this semester at WCC. However, in that case the sample would be a small portion of the size of that population. Those same ten students would be a sample of all registered credit students at WCC this term, but now that sample is a miniscule portion of the entire population. We will have to be concerned about the size of a sample, and to some extent, about the relative size of the sample to the population.

Variables – 1, 2, many

Once we have identified a population or a sample we want to look at some measures (or counts) of that data. We might look at a number of different measures, as in age, height, weight, hair color, number of years of schooling, and so forth. Although we may look at many different measures, we make a distinction between cases where we look at a measure without looking at other measures and those cases where we look at the interaction of two or more measures. In the former case we talk about one variable statistics, or as it has been popularized via the calculators, 1-var stats. Thus, in our examples, we might look at age and just consider various ways to describe the age of "things" in the data population or sample. We could do the same thing with the variable height, and it would still be a case of one variable statistics. However, if we look at height and weight together, that is we consider the interaction of the two variables, then we are looking at two variable statistics.

Kinds of Data

Our raw material in this chapter is the measurements that are taken to create our set of data values. We might measure the height of people, the weight of cars, the temperature on a roof, the age of pets, the number of cars in different aisles in the parking lot, the number of vehicles from each manufacturer in each aisle in the parking lot, or the opinions of students to questions on a course evaluation survey. All of these are measures, but they are not all the same kind of measure.

There are a number of different ways to divide our view of data. In some texts and courses the data is classified as continuous or discrete. In other texts and courses data is classified as categorical, sometimes called qualitative, or quantitative. For us we will look at four kinds of measurements: nominal, ordinal, interval, and ratio. After looking at these four we will return to the other conventions and see how they fit with our four classifications.

You may want to look at this "random" data for a class of students.

Nominal

Nominal measurements assign numbers merely as names. We do this all the time. For example, we could open a bag of M&M's^® and inspect each piece of candy in it. For each piece we could assign a number depending upon the color of the piece. We could assign 1 for a red, 2 for a green, 3 for a yellow, 4 for blue, 5 for a light brown, 6 for a dark brown, and 7 for orange. By doing this we would get a data set that might look like
{1, 1, 2, 1, 6, 6, 6, 7, 4, 1, 4, 3, 2, 3, 6, 2, 1, 6, 6, 6, 6, 7, 6, 6, 7, 2, 1, 6, 6, 7, 6, 4, 4, 2, 1, 6, 2, 4, 7, 6, 7, 3, 6, 6, 6, 6, 3, 6, 3, 7, 1, 6, 6, 2, 6}
In fact, that is exactly the data set derived by examining the pieces taken from a real 47.9g bag of M&M's^® purchased on August 27, 1999. The nominal measurements (i.e., the number names) for the pieces were recorded as pieces were poured out of the bag, one at a time. Note that the numerals are being used simply as names for the different colors of the candy pieces. There is no particular importance attached to the numeric names. A red candy, given the name 1, is not better or worse than is a green candy, given the name 2. The numeric names are not measuring anything. They are just numeric names, assigned to represent the different colors of candy.

Looking at the manufacturer of vehicles in an aisle of the parking lot would give us another example of a nominal measurement. We could use 1 for any GM vehicle, 2 for any Ford, 3 for any Chrysler, 4 for any Toyota, 5 for any Honda, and so on. We would need to have a list of manufacturers and we would need to assign a numeric name to each item on the list. Then we could move down an aisle of the parking lot and simply record the appropriate numeric name for any particular vehicle that we find. As before, having a numeric value higher or lower than the numeric value of a different vehicle says nothing about the vehicles other than that they are from different manufacturers.

We fill out school applications, registration forms, credit applications, and other questionnaires that ask us to indicate our gender, male or female. It is common to code our responses using a number, perhaps 1 for female and 2 for male. There is no inherent meaning to the numbers that we use. The makers of the survey or form could just as easily have assigned 75 to female and 28 to male, or 1 for male and 2 for female. There is no implied order to a nominal measurement, no implied value.

As a final example of a nominal measurement, consider your social security number. It is recorded on any number of different computer systems. It is used as a means to identify different people. It is a government assigned name for those people.

Ordinal

Ordinal measurement uses the order of our numbers to reflect the natural order of the items we are measuring. Opinion scales are among the most common ordinal measurements. Surveys measure our opinions by giving us statements and then asking us to rate our answer as

Strongly Disagree
Disagree
Neutral or no opinion
Agree
Strongly Agree

After we have filled out the survey, our responses may be coded as

Strongly Disagree
Disagree
Neutral or no opinion
Agree
Strongly Agree

In doing this we have created an ordinal measurement. The higher numbers correspond to higher levels of agreement with the original statement.

As another example of an ordinal measurement, we could ask people to rate the quality of motion pictures on a scale from 1 to 10 where 1 is the worst and 10 is the best. The order of various rankings would reflect the order of opinions on the quality of the movies. We understand that a ranking of 4 indicates a higher quality than does a ranking of 3. Note, however, that there is no implication that the change from 3 to 4 is the same as is the change from 4 to 5. All we know is that for the system we are using, a movie rated as 4 is believed to be of a lesser quality than is a movie that is rated as 5. Furthermore, there is no "yardstick" that is being used to determine these rankings. A movie rated as an 8 by one person does not necessarily mean the same as does an 8 from another person. All that we do know is that if both people rate one move lower than another then the people agree on the relative order of the quality of the movies.

We need to emphasize that point. Let us consider an example where George rates one movie as an 8 and another as a 7, while Betty rates the same movies as 8 and 3, respectively. Assuming that these ratings are all that we know about George and Betty, then all that we can really say is that George and Betty agree on the relative order of the qualities of those two movies. The fact that they both rate the first movie as an 8 tells us nothing. George's 8 does not indicate in any way an agreement with Betty's 8. The fact that that George rates the two movies close together (8 and 7) while Betty rates them further apart (8 and 3) tells us nothing. The spread in George's ratings (from 8 to 7) does not in any way indicate that George really considers the two movies to be of a more similar quality than does Betty, even though Betty has a wider range in her ratings. These are two different people and we have no reason to believe that they are using the same internal measurement system to determine their opinion on the quality of the two movies. Nonetheless, we do know that they agree that they view the first movie as being of higher quality than is the second movie.

It is tempting to read more into such ordinal measures than we should. It is tempting to say that George and Betty agree on the quality of the first movie and that they disagree on the quality of the second. It is tempting to say that George thinks that the movies are close in quality while Betty sees the two movies as being far different in quality. Those temptations are not justified. Unfortunately, in far too many real-life cases, the temptation wins out and people jump to such unfounded conclusions.

Interval and Ratio

Interval and Ratio measurements are similar to each other. They both refer to a measurement for which there is a consistent "yardstick", where two people can derive the same measurement of an item by using that "yardstick". Temperature measurement in Fahrenheit is good example of an interval measurement. Not only is 86°F warmer than is 43°F, but it is also the case that 86°F is always the same temperature, no matter who reads the thermometer (assuming that they read it correctly). The thermometer acts as the standard "yardstick" for this measurement.

Weight in pounds is an example of a ratio measurement. Something that weighs 8 pounds is heavier than is something that weighs 4 pounds. 8 pounds of sugar may be sweeter than is 8 pounds of salt, but 8 pounds of sugar is no more or less heavy than is 8 pounds of salt, or 8 pounds of iron, or 8 pounds of anything. A scale provides the standard "yardstick" for measuring weight.

We have seen an example of an interval measurement (temperature in Fahrenheit) and of a ratio measurement (weight in pounds). The two examples seem quite similar. What then is the difference between an interval and a ratio measurement? Note that it makes sense to say that something that weighs 8 pounds is twice as heavy as is something that weighs 4 pounds. In fact, 8 pounds of sugar weighs just as much as does two 4-pound packages of sand. However, we would not make the same kind of statement about temperature. 86°F is not twice as warm as is 43°F. Two 43°F days are not the same as one 86°F day. Furthermore, a weight of 0 pounds means no weight, that something is weightless, whereas 0°F does not mean no temperature, or no heat energy (which is what temperature measures). Ratio measurements have a true zero value, interval measurements do not.

0°F is not true zero. It does not signify the absence of heat energy. Temperature measured in Celsius is also an interval measurement. 0°C is not true zero, and 20°C is not half as warm as is 40°C. It is important to note that there is a ratio measurement of temperature, namely the Kelvin scale. 0°K represents absolute zero, the total lack of heat energy. And something that is at 100°K has half the heat energy as does something at 200°K.

Discrete vs. Continuous

Interval and ratio measurements have a degree of accuracy that needs to be examined. Your body temperature is probably quite close to 98.6°F. However, it is most certainly not exactly 98.6°F. Even if a digital thermometer were to give a 98.6°F reading, we need to recognize that every true temperature value from 98.55°F through 98.64999°F will show up as 98.6°F on a thermometer that is only accurate to the nearest tenth of a degree. Height is a ratio measurement. How tall are you? Whatever answer you gave, it is certainly wrong. Your answer may well be true to the nearest inch, or even to the nearest quarter inch, but your answer is certainly not true to the nearest thousandth of an inch. Interval and ratio measurements tend to be taken on things that have a continuous nature. Generally, we can always get more accurate interval and ratio measures if we just use a more accurate "yardstick". However, in a practical sense, we always settle on some approximation to the true value, and we round off our answers to some degree of accuracy. It is interesting to note that this does not happen with nominal and ordinal measurements. Nominal and ordinal measures tend to be discrete, meaning that they correspond to our "counting numbers".

The distinction between discrete and continuous data is clouded by reality. Whereas we would consider height measurements of people given to the nearest foot (i.e., 3, 4, 5, 6, or 7 feet tall) to be discrete we would probably consider height measurements of people to the nearest millimeter to be continuous. However, both are just rounded values. In reality, when we are processing measurements, we are almost certainly processing either discrete measures or rounded off measures of continuous values. We will see this distinction again in our later discussions.

Categorical/Qualitative vs. Quantitative

The other classification of data was categorical, sometimes called qualitative or quantitative. Categorical corresponds to what we have identified as nominal and ordinal values. The alternative name, qualitative implies the non-measurable nature of these values. Of course, that leaves quantitative values to correspond to things that we can measure, namely, interval and ratio measures.

Return to Topics

©Roger M. Palay Saline, MI 48176 January, 2021