## Frequency Tables -- grouped values

This page presents issues related to grouping values. There are many cases where our measurements are a bit finer than we need. If we are looking at the weight of a some collection of people, do we really care if a single person weight 145.6 pounds or 145.7 pounds? Remember that your weight varies by more than a pound during a day. Those people who exercise significantly, and those people who eat significantly, can see ever a wider fluctuation in their weight during a day. Let us consider the values given in Table 1.
Looking at the data in Table 1 it is pretty clear that it would not make any sense to try to find the frequency with which values appear. A few values may repeat two or three times, but for the most part, the table is filled with different values. However, it is also pretty clear that these values are bunched together is some way. Furthermore, we know that we could get a histogram of the values such as the one shown in Figure 1.

Figure 1 From Figure 1 we can say that there is one value in Table 1 that is between 80 and 90, here are two values between 90 and 100, there are 8 values between 100 and 110, and so on. The histogram accumulates values into bins or buckets and simply reports the number of values in each collection.

Seeing that we can get a frequency of the values in each bin leads us to following steps similar to those that we had in the case where we had a relatively small number of discrete values. Namely, we want to produce a frequency table that gives such things as the relative frequency and the cumulative frequency. The simple version of such a table, the version that just gives us the intervals (i.e., the endpoints of each bin) and the frequency for each bin appears here as Table 2.

It is a bit comforting to note that the frequency numbers in Table 2 are exactly the same as the frequencies shown in the histogram of Figure 1. You may recall that in R, the language used to produce Figure 1, the intervals are "closed on the right" by default. That is, in Figure 1, the interval from 90 to 100 included the right end value, 100. The interval from 100 to 110 includes the 110, but not the 100 since it is in the interval from 90 to 100. The intervals in Table 2 conform to that same pattern. In fact the first interval is writen as (80,90], to indicate that it is "closed on the right.

Just as the histogram could have been made with intervals "closed on the left" so too could we create a frequency table that follows the same rule. Table 3 does that.

Two things should be obvious in comparing Table 2 to Table 3. First, in Table 3 the intervals are indeed "closed on the left". Second, the frequencies changed slightly. The reason for this change can be seen by inspecting the values in Table 1. When we do that we note that there is a value of 140.0 in the original data. For Table 2 that value ends up in the (130,140] interval. However, for Table 3 that same value is found in the [140,150) interval.

Just for completeness we create a histogram, shown in Figure 2, that uses this same "closed on the left" approach.

Figure 2 If we come across a table such as Table 2 where we do not have access to the original data, the best we can do to "characterize" the now unknown original data is to use the midpoint of each interval as the representative value for that interval. Therefore, it would be nice if we could expand our table to not only give the individual intervals but also to give the midpoint of each interval. Table 4 has such an expanded structure.

Then too, just as we had expanded our frequency tables in the discussion of discrete values, we should expand our frequency table here to include the relative frequency, the cumulative frequency, and the relative cumulative frequency. We have done this in Table 5.

That only leaves the addition of a column that contains the number of degrees that should be allocated in a pie chart to each interval. Remember that a pie chart is a poor way to represent the distribution of values, and that, despite that issue, it is commonly requested and produced.

The interpretation of each of these columns is identical to that given in the earlier discussion of discrete values,

Here is another set of data, this time it is data that changes each time the web page is reloaded.
Having seen these frequency tables the next challenge is to see how to produce them in R. To do that we look at Computing in R: Frequency Tables -- Grouped Values.