Computing in R: Frequency Tables -- Discrete Values

Return to Topics page

This page presents R commands related building and interpreting frequency tables for discrete values. To do this we need some example data. We will use the values given in Table 1
From the Discussion page we know that we can construct a simple frequency table for the values in Table 1 as
The question is, how do we do this in R?

Consider the following commands
source( file="http://courses.wccnet.edu/~palay/math160r/gnrnd4.R")
gnrnd4( key1=859459203, key2=800065 )
L1
tabulate(L1)
which we use to generate the data values, verify that we have the same values, and then attempt to use the R command tabulate() to see if that produces the desired result. Figure 1 holds the Console image from an RStudio session where we performed those commands.

Figure 1

From Figure 1 we see that we have the correct values in the variable L1. And, looking at the results of the tabulate(L1) command, we do see the desired values of 4, 5, 10, 14, 17, 16, 13, and 14. But what are all of those 0's and we know that something happened 4 times, but what value was it that appeared 4 times? The leading 0's correspond to the number of times that we found 1 in the data (namely 0), 2 in the data (namely 0), 3 in the data (namely 0), and so on. The final line of the output tells us that both 64 and 65 did not appear in the data, but that 66 appeared 4 time, 67 appeared 5 times, 68 appeared 10 times, and so on. All of this seems quite messy.

Let us try a different R command, namely table(L1). Figure 2 shows the result of that command.

Figure 2

Now that is more like it! The image in Figure 2 is almost identical to Table 2. We are just missing some identifying text and the lines to make it look like a table. To move further with this, we want to be able to look more closely at the results of the table(L1) command. Therefore, we perform the command again, but this time we save the result in a variable called freq. To do this we perform the command freq <- table( L1 ).

Figure 3
As you can see, in Figure 3, the line that we executed produced no output. (By the way, getting no output also means getting no error or warning messages, an indication that everything is OK.)

If, however, we now just give the name of the variable, freq, R displays the contents of the variable, as in Figure 4.

Figure 4
We could have just looked at the Environment pane of our RStudio to see that freq is now defined. That is shown in Figure 5.

Figure 5
Notice, in the Environment pane, that freq is defined as a 'table' of integer values, indexed from 1 to 8 as a 1 dimensional structure with values 4, 5, 10, 14, 17, 16, 13, and 14. There is nothing in the Environment pane to indicated that there are labels attached to those values. However, looking back at Figure 4, we see that there clearly are such labels.

To demonstrate this, and to prepare for the next steps, we separate those values and labels in freq. The function as.vector(freq) produces just the list of values in freq. The function names(freq) produces just the list of labels in freq. Therefore, the commands by just giving the names of the two new variables,
freq_vals<-as.vector(freq)
freq_names<-names(freq)
not only extract those lists but also assign them to freq_vals and freq_names, respectively. If we follow thse two commands by just giving the two variables, freq_vals and freq_names, we see what is now assigned to those two variables. All of this is shown in Figure 6.

Figure 6

Of course, now that we just have the values stored in freq_vals, we could find the relative frequency if we divide that variable by the number of values in L1, i.e., in Table 1. We could count those values, but it is much safer to let R figure this out. The commands
freq_size <- length(L1)
freq_rel <- freq_vals/freq_size
freq_rel
will compute the size of L1, store that value in freq_size, divide each value in freq_vals by that size, store the results as values in freq_rel, and finally, display the values in freq_rel. All of this is shown in Figure 8.

Figure 7

Unfortunately, Table 2 did not include the relative frequency. We correct that oversight, and prepare for the rest of the work here by including Table 3.

Our R computed values for the relative frequency shown in Figure 7 conform to those shown in Table 3.

What about the cumulative frequency? R has a function, cumsum(), that will create this from the values stored in freq_vals. Thus the commands
freq_cumul <- cumsum( freq_vals )
freq_cumul
shown in Figure 8, compute the cumulative frequency, store those values in freq_cumul, and then display those values. All of this is shown in Figure 8.

Figure 8

The values shown in Figure 8 match the cumulative frequency values in Table 3.

The next step is to compute the relative cumulative frequency. But this is just the cumulative frequency values divided by the freq_size. The commands to do this and display the results are
freq_rel_cumul <- freq_cumul/freq_size
freq_rel_cumul
and the use of those commands in R is shown in Figure 9.

Figure 9

Again, the values shown in Figure 9 correspond to the values in Table 3 for relative cumulative frequency.

That leaves us with the task of finding the degrees in a pie chart values shown in Table 3. To do this we need to multiply 360 times the values in freq_rel, and we would like to round the result to 1 decimal place. The R commands to do this and display the results are
freq_pie <- round(360*freq_rel,1)
freq_pie
and the use of those commands in R is shown in Figure 10.

Figure 10

At this point, in Figures 4, 7, 8, 9, and 10, we have seen how we can get R to compute all of the values that we have in Table 3. It would be nice if we could also get R to produce a chart such as Table 3 giving all of the values in one place. However, rather than mimic the horizontal version of the frequency table shown in Table 3, we will try to get a version of the vertical of the frequency table. Such a vertical version is given as Table 4.

Before we start constructing such a table in R we observe the values displayed in the Environment pane of our RStudio session. Figure 11 shows a part of that pane.

Figure 11

All of the variables shown in Figure 11 are separate entities. We want one entity that holds many columns of values where each "row" of the entity has related values in it. Such a structure in R is called a data frame. We will build that structure from the existing variables.

We start with the commands
df_freq <- data.frame( freq)
df_freq
to create a data frame called df_feq from the table freq. Recall that we know freq has both labels and values in it. (We saw that back in Figure 6.) When we perform the commands just noted R takes the table freq and puts it into the data frame structure called df_freq. In doing so, R has created df_freq with two columns, one for the labels and one for the values. All of this is shown in Figure 12.

Figure 12

The display of df_freq given in Figure 12, is organized just as we want in order to mimic Table 4, or at least to start to do this. The values are arranged in columns.

If we go back to the Environment pane, we see that the variable df_freq is now defined in the Data area, and that it has 8 observations of each of two variables. The image of this appears in Figure 13.

Figure 13

R has an additional command View() that improves upon the display of df_freq. [Note that View() starts with a capital letter V.] Performing View(df_freq) in the Console pane of our RStudio session produces no output there, as is shown in Figure 14.

Figure 14

However, performing that command opens a new window in the upper left corner of our RStudio session, and it places a nice table view of df_freq in that pane. Figure 15 shows that table view.

Figure 15

This is an even better view of the values that we want.

A small aside.

The work in preparation for this web page was done in a RStudio session. The View(df_freq) command in just a straight forward R session behaves in a slightly different fashion. In that case, the View(df_freq) command opens a new window with the values in it. A display of such a window is given in Figure 15a.

Figure 15a

This window is not nearly as powerful as is the window in RStudio. [We will see some of that power later on this page.] However, it does look nice.

We want to add the values stored in freq_rel to our data frame called df_freq. We can do this by using the command
df_freq$rel<-freq_rel
This will create a new column in df_freq, called rel, and assign the values found in freq_rel to that new column. Please note that although we kept the names pretty similar, there is no requirement to do so. Figure 16 shows the command from the Console pane in our RStudio session.

Figure 16

After performing the command as shown in Figure 16, we can look again at the Environment pane in our RStudio session. A portion of that pane is shown in Figure 17.

Figure 17

We can see, in Figure 17, that df_freq is now a structure of 3 variables.

Looking back at the top left pane of the session, shown in Figure 18, we see that without even asking for a redisplay, the nice looking table that we had created before has been augmented to show the new third column. [Note that this automatic updating of the View result is another difference between doing this in RStudio versus doing it in straight R. In straight R we would have to perform another View(df_freq) command.] Furthermore, the title of that column is now rel, the name we used when we created it.

Figure 18

We continue the process by adding the other three columns with the commands
df_freq$cumul<-freq_cumul
df_freq$rel_cumul<-freq_rel_cumul
df_freq$pie<-freq_pie
as shown in Figure 19.

Figure 19

And now, in our RStudio session, the View display is updated to appear as in Figure 20.

Figure 20

Back in the Console pane, if we just give the variable name, df_freq, R displays, as shown in Figure 21, all of the values right there. While not \as pretty as the other display, this may be adequate for your needs. One advantage of this display is that you can highlight and copy it so that you can paste it into another document, possibly as input to some other program.

Figure 21

If all you want to do is to compute and display the values that we have found for our frequency table, then there is no need to read further on this page. All of the required steps are presented above and you can simply follow those same steps for your next problem.

On the other hand, there is much more to see, both in terms of the View() output in RStudio and in terms of the codifying the numerous steps that we took to generate the data frame. The discussion below starts with two figures that illustrate some of the extra power in the RStudio version of the View() display. Following that there is a sequence of figures and the related text to walk through a process to save and then re-use the steps that we went through in creating the data frame.

Taking a close look at Figure 20, you might notice that there is something at the right end of each header cell in the table. Figure 21a repeats that header row and circles, in red, that special area in each header cell.

Figure 21a

If you point to the header cell, as shown in Figure 22, a small box opens to give you information about that column.

Figure 22

If you are pointed at the header cell and click on it then RStudio sorts the entire data frame on the basis of that column. In Figure 22 we were pointing at the header cell for the Freq column. We click on that header cell and the image becomes that of Figure 23.

Figure 23

Notice in Figure 23 that the items in the Freq column are now in ascending order. Furthermore, the rest of the cells in the table have been rearranged so that the individual rows of the table in Figure 23 are identical to the rows in Figure 22; the rows are just in a different order. In fact, the first column which gives the position of the "rows" of data in the original data frame still gives us that same information. Thus, the value 72 which had been the 7th value in Figure 22, is now the fourth value in Figure 23. However, 72 is matched with a Freq value of 13 in both figures and Figure 23 still tells us, via that first column, that 72 was the seventh item in the original structure.

Clicking on that same header cell again will reverse the sort as seen in Figure 23a.

Figure 23a

As you might expect, particularly in a course such as this one, there are many times when you might be asked to create a frequency table from some data. The process outlined in the various images above is not too complex and not too long, but it is still a pain to both remember and to perform. It would be nice if we had a way to record that process and, essentially, play it back when we need it. One way to do this is to create our own function and to put the process into that function. The rest of this page walks us through doing just that.

We really could create the new function in any text editor, but since we already have an RStudio session open, we will do it right in this session. First, we need to create a new workspace. We start by pointing to and then clicking on the File menu option. This opens the option window on the left of Figure 24, the window starting with New File. Then we point to that New File option and just the action of pointing at it opens the secondary window to its right, the one starting with R Script. That is the option we want. Therefore, click on that R Script selection. Figure 24 shows us pointing to that option.

Figure 24

That is the option we want. Therefore, click on that R Script selection. Clicking on that option opens a new workspace in the upper left pane of the RStudio window. The blank, new workspace is shown in Figure 25.

Figure 25

You might notice that the new workspace starts with the name Untitled1. That will change later when we finally save the workspace as a file.

The function that we will create will follow the steps that we took at the start of this page, although there are points where two actions have been combined into one. The function is given by the following lines of code:
 make_freq_table <- function( lcl_list )
{
  ## This function will create a frequency table for 
  ## the one variable sent to it where that
  ## table gives the items, the frequency, the relative 
  ## frequeny, the cumulative frequency, the relative
  ## cumulative frequency, and the number of degrees to 
  ## allocate in a pie chart.
  ##
  ## The actual result of this function is a data frame 
  ## holding that table.
  lcl_freq <- table( lcl_list )
  lcl_size <- length( lcl_list )
  lcl_df <- data.frame( lcl_freq )
  names( lcl_df ) <- c("Items","Freq")
  lcl_values <- as.numeric( lcl_freq )
  lcl_df$rel_freq <- lcl_values /  lcl_size
  lcl_df$cumul_freq <- cumsum( lcl_values )
  lcl_df$rel_cumul_freq <- cumsum( lcl_values ) / lcl_size
  lcl_df$pie <- round( 360*lcl_df$rel_freq, 1 )
  lcl_df
}
 
The lines are provided above so that you can, if desired, just copy them from this web page and paste them into your new, blank workspace. Alternatively, you could just type them into the workspace.

Discussing the meaning of the lines follows Figure 26 because in that image of the lines we have line numbers provided by the RStudio editor.

Figure 26

Here is a discussion of the lines in the workspace:
  1. make_freq_table <- function( lcl_list )Assigns to the name make_freq_table a function that will be defined by the rest of this line and all the rest of the lines those enclosed by the { and } pair of characters. Furthermore, this function will have a single argument which we will call lcl_list for the duration of the function definition. Our intent is to be able to call this function and send to it a list of values. Most likely that list will be in the variable L1 but it could be in any variable. If the values are in L1 then we will call the function by using the command make_freq_list(L1) in shich case lcl_list will be assigned a copy of L1.
  2. { The squiggly brace on line 2 marks the start of the body of the function definition. It will have to be matched by a closing squiggly brace at the end of the definition.
  3. As soon as we encounter a "pound sign", the # character, the rest of the line is just a comment. It does nothing other than to explain to a human reader what is going on here.
  4. More of the comment, but note that it is a matter of style to start with the double ##, a single one is sufficient.
  5. More of the comment.
  6. More of the comment, but note that even incorrectly spelled words may appear in a comment.
  7. More of the comment.
  8. More of the comment.
  9. More of the comment, though in this case it is just a blank comment used to put some spacing into our overall comment.
  10. More of the comment.
  11. More of the comment.
  12. lcl_freq <- table( lcl_list ) Use the table() function to get a count of the differrent values that are stored in lcl_list. Put that result in lcl_freq.
  13. lcl_size <- length( lcl_list ) Use the length() function to determine the number of values in the lcl_list. Put that result in lcl_size,
  14. lcl_df <- data.frame( lcl_freq ) Use the data.frame() function to convert the 'table' that we created in lcl_freq into a data frame.
  15. names( lcl_df ) <- c("Items","Freq") This is a command that we did not use originally, but it was included here to force the names of the two columns in lcl_df to be Items and Freq, respectively.
  16. lcl_values <- as.numeric( lcl_freq ) Use the function as.numeric() to pull out the values that make up the table that we had created. We do this because it will make the next statement more clear.
  17. lcl_df$rel_freq <- lcl_values / lcl_size Compute the relative frequency by dividing the frequency values by the number of values in the original list. Store this group of values in a new column of lcl_df called rel_freq.
  18. lcl_df$cumul_freq <- cumsum( lcl_values ) Use the cumsum() function to get the cumulative sums and store those in a new column of lcl_df called cumul_freq.
  19. lcl_df$rel_cumul_freq <- cumsum( lcl_values ) / lcl_size Use the cumsum() function to find the cumulative sum of values (this is a bit wasteful since we had made this computation before, but it jsut a wasted bit of machine time) and then divide those values by the number of values in the original list. Then store the results in a new column of lcl_df called rel_cumul_freq.
  20. lcl_df$pie <- round( 360*lcl_df$rel_freq, 1 ) Compute 360 times the relative frequency values, round the answers t 1 decimal place, and store the results in a new column of lcl_df called pie.
  21. lcl_df Make the value of the function be the data frame that we have created. This is important in that if, later, we just call the function make_freq_table() then the result will be the data frame and R will display the values in that data_frame. However, if we call the function make_freq_table() and assign it to a variabe, then that variable will be assigned the value of the data frame that we created in the function.
  22. } Finally, the closing brace marking the end of our function definition.
Having entered the code lines into the workspace, our next task is to save this workspace as a file on the computer. To do this we click on the File menu option. RStudio opens the window shown in Figure 27. Then we move the cursor down to the Save As... option.

Figure 27

When we click on the Save As... option, we get a new window, an example of which is shown in Figure 28, to help us name the new file and to locate that file in whatever directory we desire. Assuming that the window shown in Figure 28 has correctly identified the desired directory, note that we have given the system a file name, in this case make_freq_table.R. It is helpful, but not at all required, to have the name of the file be similar to if not identical to the name of the function.

To actually save the file we click on the Save button.

Figure 28

Once the file has been saved, we note that the tab for the workspace has changed from Untitled to make_freq_table.R as shown in Figure 29.

Figure 29

It is important to note that at this point we have created and saved the file, but we have not told R anything about the function we have designed. There are two ways to tell R about this. The first, illustrated here, is to highlight the entire file (Alt-A is a good way to do this), and then point to and click on the Run option at the top of the editor window. Figure 30 shows everything highlighted and the cursor pointing to the Run option.

Figure 30

When we click on that Run option RStudio submits the highlighted lines to R in the Console window. Figure 31 shows that submission in our Console window.

Figure 31

Once the function has been submitted, it is available for use. The commands
dd<-make_freq_table( L1 )
dd
cause R to run our newly defined function make_freq_table() using L1 to give values to lcl_list in the function. The result of the computations within the function, namely the data frame constructed within the function, is then assigned to the variable dd. The second line, dd causes R to display the values now in dd. All of this is shown in Figure 32.

Figure 32

Then we give the command View(dd) as shown in Figure 33.

Figure 33

This creates a new tab in the top left pane of our RStudio session, as shown in Figure 34.

Figure 34

In order to demonstrate a different method for loading the function into R, we first close this session. That is shown in Figure 35.

Figure 35

We will start from the beginning. In Figure 36 we have started a new RStudio session, which, in turn, started a new R session in the Console window. Then, because in our earlier session we saved the entire function in a file called make_freq_table.R, we can use the command
source("make_freq_table.R")
to tell R to read the contents of that file as if we had typed them into our R session. This is done in Figure 36.
Note that the command here tells R to load the function from the current working directory. This works because we saved that function to this directory earlier. If we had wanted to load the function from the parent directory which contains the functions I have provided, then we would use the command source("../make_freq_table.R") instead.

Figure 36

Note that there is no error or warning message as a resut of our source() command. Furthermore, unlike our example in Figures 30 and 31, we did not have to highlight the code for the function and the code of the function does not show up in the Console area.

We can verify that make_freq_table() is again defined as a function by looking in the Environment window. There we will find make_freq_table identified as a current function. We see this in Figure 37.

Figure 37

Because we are assuming this is a new session, we need to load the gnrnd4() function, and then run it again. This time we will generate a different table of values, those in Table 5.
We construct the full, vertical frequency table for the values in Table 5:
To make the same table in R we will use the following code lines
source("make_freq_table.R")
source( file="http://courses.wccnet.edu/~palay/math160r/gnrnd4.R")
gnrnd4( key1=546789202, key2=1200034 )
L1
new_df <- make_freq_table( L1 )
new_df
View( new_df )
to load the required functions, generate the data, run the make_freq_table( L1 ) function and store the result in new_df, display the contents of new_df in the Console area, and finally via the view(nnew_df ) command, open a new window in our RStudio session to display the table.

The first code line, source("make_freq_table.R") was discussed above. Figure 38 show executing the second and third lines of code in the Cpnsole window.

Figure 38

The fourth line of code, L1, just displays the data values that we have generated. This is shown in Figure 39 and we can verify those values against the values in Table 5.

Figure 39

The next two lines of code,
new_df <- make_freq_table( L1 )
new_df
just call our function, passing the values in L1 to that function, assign the result of the function to the variable new_df, and finally display the contents of that new variable. This is shown in Figure 40.

Figure 40

Finally, we use the code View( new_df ). There is no result of this in the Console window, as seen in Figure 41.

Figure 41

However, there is now a new display window giving the table in a very nice form, as in Figure 42.

Figure 42


Here are lines of a script that, for the most part, duplicate the lines of R used on this web page. Please note that where the script diverges from the commands used above there are significant notes in the script to guide you to appreciate and understand the changes.

# Frequency tables in R
#
#  For this script, rather than look for our files
#  in our "parent" folder, we will load them from
#  Palay's website.

source( file="http://courses.wccnet.edu/~palay/math160r/gnrnd4.R")
# generate the list of value shown on the web page
gnrnd4( key1=859459203, key2=800065 )
L1  #verify that we have the right values

# now try to use the built-in tabulate() function
tabulate(L1)
# That gave us more than we want

# shift over to use the built-in  table()  function
table( L1 )

# That gives us just the values that we want.
# However, let us store those values in a new variable
freq <- table( L1 )
freq  # and then look at what we have stored

# We notice that our variable  freq  holds both
# the names of the items and the values of the
# frequencies.  Let us pull those out, separately,
# and store them in their own variables

freq_vals<-as.vector(freq)
freq_names<-names(freq)
#  then look at what we have stored
freq_vals
freq_names

#  now we wwant to move on to finding the relative 
#  frequencies.  To do that we need to divide each
#  frequency by the total number of items.

#  first get the total number of items
freq_size <- length(L1)
#  then compute the relative frequencies and save 
#    those computed values in a new variable
freq_rel <- freq_vals/freq_size
#  now look at those values
freq_rel

#  Now we are ready to find the cumulative frequencies.
#  To do this we can use the built-in function cumsum().
#  And, we will store those values before we look
#     at them.
freq_cumul <- cumsum( freq_vals )
freq_cumul

#  Now it is an easy step to generate and then
#  look at the relative cumulative frequencies.
#  We just divide the cumulative frequencies by
#  the number of items, which we computed and 
#  saved earlier.

freq_rel_cumul <- freq_cumul/freq_size
freq_rel_cumul

#  And, even though we know it is a bad idea to
#  make and use a pie chart, and even though R
#  would do that for us, it is a easy step to 
#  compute the number of degrees to allocate
#  in a pie chart for each of the different 
#  values in our data.  We just multiply the 
#  relative frequencies by 360.  In this case we 
#  take a further step and round that to one
#  decimal place.
freq_pie <- round(360*freq_rel,1)
freq_pie

#  So far we have computed all of the values that 
#  we would include in a frequency table.  What we
#  have not done is to put all of those values 
#  into a construct that will display the 
#  completed frequency table in R.  

#  We will do that now but putting copies of 
#  the desired variables into a dataframe.

df_freq <- data.frame( freq)
df_freq

#  With that simple start we have the beginning
#  of our "vertical" frequency table.

# We will take a small step sideways here to
#  look at another way that R can display that
#  table.  We can use the View() function to 
#  do that.  Be sure to note the capital V.
View(df_freq)

# Now we can return to our task of building 
# our complete frequency table.  We can add the 
# relative frequencies to our dataframe
df_freq$rel <- freq_rel

#  and we could get a new view of that dataframe
View(df_freq)

# Now complete our build by adding the other 
#  three columns.
df_freq$cumul <- freq_cumul
df_freq$rel_cumul <- freq_rel_cumul
df_freq$pie <- freq_pie

# again, we can use View() to see this table
View(df_freq)

# or we could go back to our old method and
#  just look at it in the console display
df_freq

# the web page on this topic goes through the step
# to create a new function that captures 
# all of the steps that we have taken to
# make a frequency table.

# Here we will just load that function, again this
# time from Palay's web page rather than from
# the parent directory.
source( file="http://courses.wccnet.edu/~palay/math160r/make_freq_table.R")

# now that the function make_freq_table() is loaded
# into our environment we can use it to duplicate
# all of the painful work that we did above in lines
# 21 through 108.

dd <- make_freq_table( L1)
dd

# Or we could use View() to get the nicer 
# looking table.
View( dd )

# The we page goes on, building on the fact that
# there were instructions on the web page for 
# actually creating and saving the function in
# our current directory, to show an alternative
# way to load the function.  We did not create 
# that local version of the function, but we 
# do have a version in our parent folder.  So,
# we will demonstrate here how to load functions
# from our parent folder.

#  First, we will use the dangerous but effective
#  command rm() to wipe out our entire environment.
rm( list=ls() )

#  Notice the environment is now empty.
#  First we want to use gnrnd4() to generate some
#  values.  To do this we need to load gnrnd4()
#  into our environment.

source("../gnrnd4.R")
# now generate the values in Table 5 of the 
# web page.
gnrnd4( key1=546789202, key2=1200034 )
L1 # just to verify the values

# now we want a full frequency table for those 
# values.  We can use make_freq_table() to do
# this, but first we need to load the function.
source("../make_freq_table.R")
new_df <- make_freq_table( L1 )
new_df
View( new_df )

# Having seen this, with just a few commands we
# now have a way to generate full frequency 
# tables.

# in fact, given that we have make_freq_table()
# in our environment, we can generate a large 
# data set and then apply our function to that
# to get a new frequency table.
source("../gnrnd5.R")
gnrnd5(78034095603, 13000045)
head( L1,20)
tail(L1, 20)
make_freq_table( L1 )

Return to Topics page

©Roger M. Palay     Saline, MI 48176     November, 2015