## Overview

This is a basic statistics course. We will be introducing you to the essential concepts of statistics. The course really has two parts: Descriptive Statistics, and Inferential Statistics. However, in order for us to talk about Inferential Statistics we will need to have a common understanding of some basic Probability concepts. Therefore, between the Descriptive and Inferential parts of the course we will cover Probability.

### Descriptive Statistics

In this part of the course we develop various ways to "characterize" a set of values. For example, we might have a list of all of the ages of all of the credit students at WCC at the start of the semester. That would be somewhere in excess of 10,000 values. Just looking at the data it is hard to get a "feeling" for those values. We have a web page (click here to see it) that just gives the data for 1000 of these 13,856 students. Feel free to look at it but I am sure you will quickly realize the futility in such a review. Instead of looking at the data, I want to find certain ways to "characterize" those values so that I can convey a "feeling" for those values to you. On the basis of that characterization you would then have some overall appreciation for the values without having to look at each one of them.

Following on with our example, after some work on my part, I can tell you that for a particular semester we had 35,856 registrations for credit courses generated by 13,856 different students. That does not tell you much about the data. However, I can also tell you that, although 71 of these students did not have a recorded birthdate, the remaining 13,785 students had an average age of 28.67 and a median age of 24. Those values, the mean and the median, are measures of central tendency. They give us a feel for the middle of the data. In particular, we now know that half the students are 24 or younger and half are 24 or older. Also, since the average age is higher than is the median age, we get the sense that there must be some "really old" students to pull up that average age.

Now we could "learn" more about our data if we are told that the youngest student was 13 years old and the oldest is 91 years old. We get a better feel if we are told that 1/4 of the students are 20 or younger (and 3/4 are 20 or older) and that 3/4 of the students are 35 or younger (1/4 are 35 or older). Knowing that 1333 of the students are 19 years old, and that that is the highest number for any age, tells us even more. We get even a better "feel" for the data if we have a picture of it as in the following bar chart:

All of that information gave us a better "feel" for the data. That is the goal of Descriptive statistics.

### Inferential Statistics

In the discussion above we were looking at the entirety of the credit students at WCC in one particular semester. As it turns out, it was quite easy to get all of that data from the student records system at WCC even though, as noted above, that amounted to looking at 35,729 registrations for credit courses generated by 13,856 different students. This is not something that we would want to compile by hand, and it is way too much to fit onto a calculator. However, it was no problem at all to "look" at that data using a computer.

On the other hand, let us say that we have created an interview process designed to get some measure of a person's willingness to support public transportation. That process, which takes about 20 minutes for each person interviewed, costs about \$42 for each such interview. We would like to determine the extent to which WCC credit students are willing to support public transportation. We could put all 13,856 credit students through the process but that would take in excess of 4,600 hours of interviewing and cost well over half a million dollars. That is too much time and too much money! However, we could ask some relatively small number of WCC credit students to participate in such interviews, and then, based upon the results derived from those students we could "infer" that the general credit student body has the same view of those public transportation issues as we found from the smaller group. That is the essence of inferential statistics. We look at the results obtained from a smaller group and, from those results, we "infer" things about the larger group.

The large group is called the population and the small group is called the sample. As much as possible, we want the sample to be representative of the population. There are many ways to select the sample, i.e., choose the credit students who will be in the sample. We will need to learn about and consider the benefits and pitfalls of different selection criteria.

Clearly if we have access to all of the data, as we did in the discussion of descriptive data, then there is no need to look at samples and infer anything from them. However, there are many cases where we just cannot get or do not want to get data from the entire population.

Consider a second example. Let us say that we are manufacturing some item that we will call a widget. Given the nature of our widgets, there is a certain amount of variability in terms of the strength of each unit. We want to be sure that a high percentage of our widgets can withstand, with only minor damage, being dropped 8 feet onto a concrete floor. Clearly, we cannot test each widget that we make; they would all be damaged, to some extent, by the fall. Rather, we can take a sample of the widgets, drop them, and see how many of the samples withstand the fall, possibly with some scratches but still working. From the proportion of "successes" in our sample, using inferential statistics, we can make some pretty good guesses at the proportion of the rest of our widgets that would "survive" a fall if they had one. Again, we look at the results obtained from a smaller group and, from those results, we "infer" things about the larger group.

### Probability

In order to perform and interpret inferential statistics we will have to understand some basic components of probability. For that reason, introductory statistics courses always allocate time for students to review and/or learn some basic probability concepts.