Please note that at the end of this page there is a listing of the R commands that were used to generate the output shown in the various figures on this page. |

This page presents R commands related to building and interpreting frequency tables for grouped values. To do this we need some example data. We will use the values given in Table 1.

Because this data has so many different values, it would not make sense to look at it as discrete values. Rather, we want to group these values into

To start, we need to generate the data in R and then find the low and high values in that data. Figure 1 shows the

`head()`

and `tail()`

functions as a shorter way to
verify the data.
We
follow that with the We need those values so that we can make a decision about the places where we want to break the range of values, from a minimum of 35.8 to a maximum of 102.3. In order to have some nice "endpoints" to our intervals we can start them at 30, end them at 110, and have an interval width of 10. There is no real need to put those values into variables, but we can do just that. The first 3 statements in Figure 2 make such assignments.

The seventh line in Figure 2,

`x_breaks <- seq( low_val, high_val, step_val)`

creates a new variable and stores in that variable the sequence
of values 30, 40, 50, 60, 70, 80, 90, 100, and 110.
We see those values displayed as a result of the command
As long as we are using the

`x_mid <- seq( low_val+step_val/2, high_val-step_val/2, step_val)`

, to
create such a sequence, and that same figure
shows the values stored in R has a command,

`x <-cut(L1, breaks=x_breaks)`

, as shown in Figure 4.
The result of the

The first value in

The interesting thing here is that we start with

`y <- table( x )`

has R
computing those frequencies and storing
the result in the valiable Then, just as we did for the discrete case, we form a

The display of our

We start that process by adding the

`df$midpnt <- x_mid`

appends those values as a new
column in the We continue the process by computing the

`rf <- df$Freq/sum(df$Freq)`

, finds the number of
values in the table by finding the sum of all the frequencies.
Comparing the values displayed in Figure 8 with those in the fourth column of

We can use the

And the pretty display is shown in Figure 10.

We continue the process by computing the

Also recall that as we change the

To this point in the web page we have seen a sequence of steps in R that we can use to go from an initial problem statement to our desired solution. The process of those steps is to take a collection of data, determine how we can break the range of those values into equal width partitions, count the frequency of the original data values in each of the partitions, and form an expanded frequency table based on those frequencies. The steps shown above from Figure 1 through Figure 12 walk us through that process. We could study, memorize, transcribe those steps and then follow them whenever we have this kind of a problem.

Alternatively, as we did for discrete values, we could write a function that just captures those steps so that we can perform the steps by just calling the function. The function that we created for the discrete case was called

collate3 <- function( lcl_list, use_low=NULL, use_width=NULL, ...) { ## This is a function that will mimic, to some extent, a program ## that we had on the TI-83/84 to put a list of values into ## bins and then compute the frequency, midpoint, relative frequency, ## cumulative frequency, cumulative relative frequency, and the ## number of degrees to allocate in a pie chart for each bin. ## One problem here is that getting interactive user input in R ## is a pain. Therefore, if the use_low, and or use_width ## parameters are not specified, the function returns summary ## information and asks to be run again with the proper values ## specified. lcl_real_low <- min( lcl_list ) lcl_real_high <- max( lcl_list ) lcl_size <- length(lcl_list) if( is.null(use_low) | is.null(use_width) ) { cat(c("The lowest value is ",lcl_real_low ,"\n")) cat(c("The highest value is ", lcl_real_high,"\n" )) suggested_width <- (lcl_real_high-lcl_real_low) / 10 cat(c("Suggested interval width is ", suggested_width,"\n" )) cat(c("Repeat command giving collate3( list, use_low=value, use_width=value)","\n")) cat("waiting...\n") return( "waiting..." ) } ## to get here we seem to have the right values use_num_bins <- floor( (lcl_real_high - use_low)/use_width)+1 lcl_max <- use_low+use_width*use_num_bins lcl_breaks <- seq(use_low, lcl_max, use_width) lcl_mid<-seq(use_low+use_width/2, lcl_max-use_width/2, use_width) lcl_cuts<-cut(lcl_list, breaks=lcl_breaks, ...) lcl_freq <- table( lcl_cuts ) lcl_df <- data.frame( lcl_freq ) lcl_df$midpnt <- lcl_mid lcl_df$relfreq <- lcl_df$Freq/lcl_size lcl_df$cumulfreq <- cumsum( lcl_df$Freq ) lcl_df$cumulrelfreq <- lcl_df$cumulfreq / lcl_size lcl_df$pie <- round( 360*lcl_df$relfreq, 1 ) lcl_df }If so desired, you could highlight that listing and copy it and then paste it to your own text editor, or even directly into an R or an

To illustrate using the

It turns out that

`collate3()`

assumes that we just do not know them, probably because we do not know the
minimum and maximim values in the data.
[Now, in this case we did know them. We found them back in
Figure 1.
However, we are trying to demonstrate using We take that information and we reissue the command, this time as

`collate3( L1, 30, 10 )`

The result is the display of the data frame created by

It is worth doing this again but for different data values. The whole process, other than loading the functions, is shown in Figure 15.

We use the command

`gnrnd4( key1=1573429104, key2=19302340 )`

to generate all new data, the command `L1`

to display
that data, the command `collate3(L1)`

to find the
minimum and maximum values, the command
`df <- collate3( L1, 190, 10)`

to create
a One feature that we have glossed over in this discussion is the decision to use intervals that are "closed on the right" as in a partition like (200,210]. In that partition, the value 210 would be part of the partition, but the value 200 would not. Instead, the value 200 is part of the partition (190,200].

What if we want to use partitions that are "closed on the left"? To do this the change actually has to go back to the

The result is reflected in the "pretty" display now shown in Figure 18.

As you can see in Figure 18 we are now using partitions that are "closed on the left" an, as a result, the frequencies change in some of the partitions.

What has really happened here is that the setting "right=FALSE" was accepted by

In Figure 19 we generate and display a new

`(170,175]`

`(130,135]`

After that we generate a similar partition, but this time overriding that rule by setting

`[175,180)`

`[135,140)`

Here is the list of the commands used to generate the R output on this page:

# the commands used on Frequency Tables -- Grouped Values
#
# First, we need to load the gnrnd4() function
# into our environment
source("../gnrnd4.R")
# Then generate the values for table 1
gnrnd4( key1=1682089104, key2=0014000650 )
#
L1 # verify the data
head(L1, 8)
tail(L1, 8)
# Get a summary of the data
summary( L1 )
# create some new variables just to hold
# some of the values we will use
low_val <- 30
high_val <- 110
step_val <- 10
# then create the "break" points
x_breaks <- seq( low_val, high_val, step_val )
# Now look at the break values
x_breaks
# While we are doing this we might as well set up
# the midpoint values for each of the intervals
# that we create
x_mid <- seq( low_val + step_val/2,
high_val - step_val/2, step_val )
x_mid # look at those values
#
# Now we are ready to find out into which interval each
# of the values in our original data, L1, gets put
x <- cut(L1, breaks=x_breaks)
x
#
# Now we are ready to start building our table. The
# variable x holds all those interval names, let us
# find out the frequency for each interval name
y <- table( x )
# let us see what is y
y
# We will create a data frame to hold all of our
# frequency table as we build it
df <- data.frame( y )
df # look at it as it has been created
#
# now append the mid points to the data frame
df$midpnt <- x_mid
df # see what the data frame looks like now
#
# Now construct the relative frequencies
rf <- df$Freq/length( L1 )
rf
# append that to the data frame
df$relFreq <- rf
df # see what the data frame looks like now
# Let us look at the pretty version of the data frame
View( df ) # note the capital V
# The next columns to add are the cumulative frequency,
# the relative cumulative frequency, and the number of
# degrees to use in a pie chart for each interval
cs <- cumsum( df$Freq )
cs
df$cumul <- cs # append cumulative sum
n <- length( L1 )
rcf <- cs/n
rcf
df$rel_cumul <- rcf # append rel cumul sum
df$pie <- round( 360*rf, 1 ) # append degrees in pie chart
df
#
# Then, rather than do all of that, we can load and
# run the collate3() function
source("../collate3.R")
# If we just give collate3 the list of values, L1,
# it just gives us help in finding the values that
# we need to give it in addition to the list
collate3( L1 )
# So we not take those suggestions but we will give
# collate3 the list, and the lowest value in the
# first interval, and the width of the interval
collate3( L1, 30, 10 )
#
##############################
# get new values and build frequency table for them
gnrnd4( key1=1573429104, key2=19302340 )
L1
collate3(L1)
df <- collate3( L1, 190, 10 )
View( df )
# The default setting is to have the intervals closed
# on the right. Get intervals closed on the left.
df <- collate3( L1, 190, 10, right=FALSE)
##############################
# Just to examine the difference in the cuts
# Generate some new values
gnrnd4( 1778231304, 15901453)
L1 # to look at the values
# create the cut points
cut_pnts <- seq( 100,180, 5)
cut_pnts
# use the default cuts (close on right)
cut( L1, cut_pnts )
# do it again but this time closed on the left
cut( L1, cut_pnts, right=FALSE)
#

©Roger M. Palay Saline, MI 48176 September, 2019