This is a basic statistics course. We will be introducing you to the essential concepts of statistics. The course really has two parts: Descriptive Statistics, and Inferential Statistics. However, in order for us to talk about Inferential Statistics we will need to have a common understanding of some basic Probability concepts. Therefore, between the Descriptive and Inferential parts of the course we will cover Probability.

Following on with our example, after some work on my part, I can tell you that for a particular semester we had 35,856 registrations for credit courses generated by 13,856 different students. That does not tell you much about the data. However, I can also tell you that, although 71 of these students did not have a recorded birthdate, the remaining 13,785 students had an average age of 28.67 and a median age of 24. Those values, the mean and the median, are measures of central tendency. They give us a feel for the middle of the data. In particular, we now know that half the students are 24 or younger and half are 24 or older. Also, since the average age is higher than is the median age, we get the sense that there must be some "really old" students to pull up that average age.

Now we could "learn" more about our data if we are told that the youngest student
was 13 years old and the oldest is 91 years old. We get a better feel if we are told that
1/4 of the students are 20 or younger (and 3/4 are 20 or older) and that 3/4 of the
students are 35 or younger (1/4 are 35 or older).
Knowing that 1333 of the students are 19 years old, and that that is the
highest number for any age, tells us even more.
We get even a better "feel" for the data if we have a picture of it as
in the following bar chart:

All of that information gave us a better "feel" for the data. That is the goal of **Descriptive**
statistics.

On the other hand, let us say that we have
created an interview process designed to get some measure of
a person's willingness to support public transportation.
That process, which takes about 20 minutes for each person interviewed,
costs about $42 for each such interview.
We would like to determine the extent to which WCC credit students
are willing to support public transportation.
We could put all 13,856 credit students through the process but that would
take in excess of 4,600 hours of interviewing and cost well over half a million dollars.
That is too much time and too much money!
However, we could ask some relatively small number of WCC credit students to
participate in such interviews, and then, based upon
the results derived from those students we
could "infer" that the general credit student body has
the same view of those public transportation issues as
we found from the smaller group. That is the essence of **inferential** statistics.
We look at the results obtained from a smaller group and, from
those results, we "infer" things about the larger group.

The large group is called the **population** and
the small group is called the **sample**.
As much as possible, we want the **sample** to be
representative of the **population**.
There are many ways to select the **sample**,
i.e., choose the credit students who will be in the sample.
We will need to learn about and consider the benefits and
pitfalls of different selection
criteria.

Clearly if we have access to all of the data, as we did in the discussion of descriptive data, then there is no need to look at samples and infer anything from them. However, there are many cases where we just cannot get or do not want to get data from the entire population.

Consider a second example. Let us say that we are manufacturing some
item that we will call a widget.
Given the nature of our widgets, there is a certain amount of variability
in terms of the strength of each unit.
We want to be sure that a high percentage of our widgets
can withstand, with only minor damage, being dropped 8 feet onto a
concrete floor. Clearly, we cannot test each widget that we make;
they would all be damaged, to some extent,
by the fall. Rather, we can take a sample of the widgets,
drop them, and see how many of the samples withstand the fall,
possibly with some scratches but still working.
From the proportion of "successes" in our sample, using **inferential** statistics,
we can make some pretty good guesses at the proportion of
the rest of our widgets that would "survive" a fall if they had one.
Again, we look at the results obtained from a smaller group and, from
those results, we "infer" things about the larger group.

©Roger M. Palay
Saline, MI 48176 September, 2015