Worksheet 03.2: Descriptive Measures

Return to Topics page
The task here is to come up with descriptive measures for the data in Table 1.

Table 1
Item number	1	2	3	4	5	6	6	8	9	10
Data value	34	37	38	41	42	43	46	47	51	53
Frequency of data	8	11	2	7	13	17	5	11	15	8

This page assumes that you have read through earlier pages and that you have mastered the steps that we use to set up our work. To that end we will assume that we have

inserted our USB drive,
created a directory called worksheet032 on that drive
have copied model.R from our root folder into our new folder,
have renamed that new copy of the file to the name ws32.R, and
have double clicked on that file to open RStudio.

The result should be a window pretty much identical to the one shown in Figure 1. {Recall that the images shown here may have been reduced to make a printed version of this page a bit shorter than it woud otherwise appear. In most cases your browser should allow you to right click on an image and then select the option to View Image in order to see the image in its original form.}

Figure 1

The data that we have been given in Table 1 shows us the frequency of each of the different data values. Rather than try to work with that consolidated set of values we would prefer to work with the "raw" data. That is, we would like the data to hold 8 of the 34's, 11 of the 37's, 2 of the 38's, and so on. To do this we start by getting the different values and the different frequencies into R. The commands

data_val <- c(34, 37, 38, 41, 42, 43, 46, 47, 51, 53)
data_freq <- c(8, 11,  2,  7, 13, 17,  5, 11, 15,  8)

will do this.

Figure 2

When we run those commands the Console pane merely shows the the commands (and the comments).

Figure 3

However, if we look in the Environment pane we can see that our two new variables have been created and we can see that they hold the correct values (well, actually all we see is that data_freq holds the correct 10 values, but the pane shown in Figure 4 is only wide enough to show the first 8 values in data_val.

Figure 4

Then, we use those two variables in the rep function to produce all of the desired values and store them in the variable data_raw.

Figure 5

The Console display shown in Figure 6 shows our 97 values comprised of eight 34's, eleven 37's, two 38's, and so on.

Figure 6

Figure 7 shows the new variable in the Environment pane.

Figure 7

Of course, now that we have the 97 raw values we can just process them sas we usually do. Figure 8 holds the commands to get a summary of the values, then to compute and display the mean separately, then to find the standard deviation assuming Table 1 represents a sample, then loading the pop_sd function into our environment, and finally, using the pop_sd function to find the standard deviation assuming the data in Table 1 represents a population.

Figure 8

Figure 9 shows the Console output from those commands. Again, we notice the brevity of the display of the value of the mean by the summary function and the increase in the number of digits displayed by computing the mean separately and displaying it.

Figure 9

Examining the Environment pane we see the display of the value there has even more significant digits.

Figure 10

The appropriate graph for this kind of data is a bar chart. Figure 11 shows the command that we would use to get the default graphic.

Figure 11

Running that command will produce the graph shown in Figure 12.

Figure 12

There is nothing wrong about the graph shown in Figure 12, but we have seen, on other pages, some of the commands that we can use to improve the appearance of that chart. Figure 13 shows the more elaborate commands.

Figure 13

Running the commands of Figure 13 gives the graph shown in Figure 14. This is more informative and it is easier to read.

Figure 14

We could also produce a frequency table for the data. It is interesting to note that Table 1 really stated the first part of such a frequency table. However, because we have created the raw data and because we have the function make_freq_table (which we have on the USB drive but which we need to load into our environment, the command that we use

make_freq_table( data_raw )

looks at data_raw and processes the values in it to get back to Table 1 and then to expand it with values for the relative frequency, the cumulative frequency, the relative cumulative frequency, and the number of degrees required for ach value in a pie chart.

Figure 15

Figure 16 shows the Console display of that frequency table.

Figure 16

The Console display version is OK, but we recall that we could get a much prettier display. To do that we will have R compute the table again but this time we will store the result, the frequency table, in a variable, in this case data_ft, and then we will View that variable. Note the capital V in the command View(data_ft).

Figure 17

Of course, running the new commands does little in the Console pane, shown in Figure 18.

Figure 18

However, it does open a new tab in the Editor pane shown in Figure 19.

Figure 19

Finally, before we leave all of this work, we note that our file, ws32.R has had its contents changed causing its name to appear in red in the tab.

Figure 20

We will click on the

to save that file. This will change the name to black letters.

Figure 21

Once that is done we move to the Console pane and directly enter the command q() k followed by the Enter key. Then we respond to the question with y as in Figure 22.

Figure 22

Once we press Enter at that point RStudio will save our hidden files and terminate.

Here is a listing of the complete contents of the ws32.R file:

# These are  the commands to do workshet 03.2
# First, we need to get the data into the computer

data_val <- c(34, 37, 38, 41, 42, 43, 46, 47, 51, 53)
data_freq <- c(8, 11,  2,  7, 13, 17,  5, 11, 15,  8)
# I used extra space to get the values to line up,
# this is not needed but it helps me to check them.

# now we will generate the "raw" data values.  We do this
# because with R it is easier to just have those raw
# values
data_raw <- rep(data_val, data_freq)
data_raw

# now that we have the "raw" data we can go ahead and 
# get our usual measures

summary( data_raw )
xbar <- mean(data_raw )
xbar
sd( data_raw )
source("../pop_sd.R")
pop_sd( data_raw )

barplot( table( data_raw ))

# not that there was anything wrong with that plot
# but we really could fix it up a bit

barplot( table( data_raw ), ylim=c(0,20),
         ylab="Frequency", yaxp=c(0,20,5),
         main="Bar Plot for Worksheet 3.2")
abline(h=seq(4,20,4), lty="dotted", col="darkgrey")
abline(h=0,lty="solid", col="black")

# Then, too, becasue we have just a small number of
# distinct values, we can build a frequency table 
# for those values.
source("../make_freq_table.R")
make_freq_table( data_raw )

# that was nice, though we want to see the really 
# pretty version of the table.  To do that we need
# store the result of the function call in a variable

data_ft <- make_freq_table( data_raw )
View( data_ft)