Worksheet 03.3: Descriptive Measures Grouped Data

The task here is to come up with descriptive measures for the data in Table 1.

This page assumes that you have read through earlier pages and that you have mastered the steps that we use to set up our work. To that end we will assume that we have
1. inserted our USB drive,
2. created a directory called `worksheet033` on that drive
3. have copied `model.R` from our root folder into our new folder,
4. have renamed that new copy of the file to the name `ws33.R`, and
5. have double clicked on that file to open RStudio.
The result should be a window pretty much identical to the one shown in Figure 1. {Recall that the images shown here may have been reduced to make a printed version of this page a bit shorter than it woud otherwise appear. In most cases your browser should allow you to right click on an image and then select the option to View Image in order to see the image in its original form.}

Figure 1

Much of the discussion that would be given here has been included in the comments in our `ws33.R` file. Therefore, it is important to read those comments. We start by generating all of the lower limits for all of the intervals.

Figure 2

The console view of those commands:

Figure 3

The we add the commands to generate and display the midpoints.

Figure 4

In Figure 5 we can see all of those midpoint values.

Figure 5

Then we enter the values of all of the frequencies.

Figure 6

In Figure 7 we can visually verify that we have them correctly entered.

Figure 7

Because the problem was kind enough to give us a way to check that we have the correct values, we might as well compute the sum of the products of the frequencies times the midpoints. Figure 8 shows the required command.

Figure 8

Running that command gives us the value in Figure 9. We see that it matches and we can move on with some confidence.

Figure 9

Now we want to use the mipoints and the frequencies to generate data that will approximate the data represented in Table 1. Figure 10 shows the command to do this.

Figure 10

It turns out that we need 209 values to represent that data. Displaying the 209 values, as shown in Figure 11, means that the list just scrolled off the Console pane in Figure 11.

Figure 11

A quick look at the Environment pane shows the data that we have created so far. It is worth noting that the Environment pane display may show some of the values in a "rounded" fashion. In particular, note that the `mid_pnts` values shown in Figure 12 are rounded version of the `mid_pnts` values that we saw in Figure 5.

Figure 12

Just for the fun of it, Figure 13 shows the command to actually compute the sum of the the `freqs` values.

Figure 13

And, when we rn that command we see in Figure 14 that there are indeed 209 such values.

Figure 14

We can now dive into finding the various descriptive measures. Remember that we have an new Environment in our working directory, `worksheet033`, so we need to reload the functions that are not built-in R functions.

Figure 15

Figure 16 shows the various values.

Figure 16

As we have seen before the `summary` function does not provide a lot of decimal places to the right of the decimal point. Figure 17 shows a method for increasing that number of displayed digits.

Figure 17

This gives rise, in Figure 18, to a restatement of the results of the `summary` function.

Figure 18

The we create the command for a quick and dirty bar plot.

Figure 19

The Console reflects the command but there is no output there.

Figure 20

Instead, the plot shows up in its own pane, displayed in Figure 21.

Figure 21

That is a pretty ugly plot, but it does get across the graph of the distribution of values in our data.

Even though we started with a frequency table, Table 1, we know that we could get a more expanded table, one that included things like he relative frequency, by using our `make_freq_table`function. Figure 22 shows the command to do that as entered into our file.

Figure 22

We get the Console view of that table in Figure 23.

Figure 23

The `View(ft)` command causes the nicely formatted table to appear in the Editor pane shown in Figure 24. That is really all that we need to do. We do note, in Figure 24, that we have yet to save all of our work. This is clear because the name of the file we are using to hold our work, `ws33.R` appears in red letters. We just click on the icon to save the file.

Figure 24

That will change the color of the file name, as we see in Figure 25.

Figure 25

Finally, in the Console pane, we give the command `q()` and then we respond to the prompt with a `y` , as shown in Figure 26.

Figure 26

Once we press ENTER after that the RStudio session closes and saves our two hidden files.
Here is a listing of the complete contents of the `ws33.R` file:
```# Worksheet 3.3 asks that we find descriptive measures
# for the data given as the frequency of grouped data.
# The frequencies are given but we need to find the midpoint
# of each of the intervals.  Fortunately, in this example,
# all of the intervals have equal width. We will enter
# all of the low values then find the midpoints.
low_vals <- seq( 175,378,29)# start, stop, step size
low_vals

# The the midpoints are just 29/2 higher than the low values
mid_pnts <- low_vals + (29/2)
mid_pnts

# now put the frequencies into a variable
freqs <- c(33, 15, 29, 25, 21, 27, 32, 27 )
freqs

# Of course we can visually inspect the values so far
# but the problem gave us a little help when it told
# us that "the sum of the frequencies times the
# midpoint of the respective span is 61297.50"
# We can find that value
sum(freqs*mid_pnts)

# Now that we feel that we have the values correctly
# established we can use the frequencies and midpoints
# to create our "raw" data.
data_raw <- rep( mid_pnts, freqs)
data_raw

# this looks good but, just as a quick check, we will
# find the sum of all the frequencies.
sum(freqs)

# Now we can just find all of the descriptive values
# using the raw data

summary( data_raw )
mean(data_raw)
sd( data_raw )
source("../pop_sd.R")
source("../make_freq_table.R")
pop_sd( data_raw )

# just as a small aside, we will encourage R to show more
# digits and then do the summary command again

options(digits=9)
summary( data_raw )

# The appropriate graph for this is the bar plot

barplot( table(data_raw ) )

# Note that the height of the bars is just the frequency
# values.  Then we can go on to make a full frequency
# table
ft <- make_freq_table( data_raw )
ft
View(ft)
```