Explanation of collate3.R
Return to Computing in R: Frequency Tables -- Grouped Values
We will start this page with a straight forward listing of the file
collate3.R almost as it exists at the web site. You can download that file
to your computer by right clicking on the
link
collate3.R. The only difference between the real file and the listing
here is that this listing has a trailing comment on every line giving the
line number of that line.
collate3 <- function( lcl_list, use_low=NULL, use_width=NULL, ...) # 1
{ # 2
## This is a function that will mimic, to some extent, a program # 3
## that we had on the TI-83/84 to put a list of values into # 4
## bins and then compute the frequeny, midpint, relative requency, # 5
## cumulative frequency, cumuative relative frequency, and the # 6
## number of degrees to allocate in a pie chart for each bin. # 7
# 8
## One problem here is that getting interactive user input in R # 9
## is a pain. Therefore, if the use_low, and or use_width # 10
## parameters are not specified, the function returns summary # 11
## information and asks to be run again with the proper values # 12
## specified. # 13
# 14
lcl_real_low <- min( lcl_list ) # 15
lcl_real_high <- max( lcl_list ) # 16
lcl_size <- length(lcl_list) # 17
# 18
if( is.null(use_low) | is.null(use_width) ) # 19
{ # 20
# 21
cat(c("The lowest value is ",lcl_real_low ,"\n")) # 22
cat(c("The highest value is ", lcl_real_high,"\n" )) # 23
suggested_width <- (lcl_real_high-lcl_real_low) / 10 # 24
cat(c("Suggested interval width is ", suggested_width,"\n" )) # 25
cat(c("Repeat command giving collate3( list, use_low=value, use_width=value)","\n")) # 26
cat("waiting...\n") # 27
return( "waiting..." ) # 28
} # 29
## to get here we seem to have the right values # 30
use_num_bins <- floor( (lcl_real_high - use_low)/use_width)+1 # 31
lcl_max <- use_low+use_width*use_num_bins # 32
lcl_breaks <- seq(use_low, lcl_max, use_width) # 33
lcl_mid<-seq(use_low+use_width/2, lcl_max-use_width/2, use_width) # 34
# 35
lcl_cuts<-cut(lcl_list, breaks=lcl_breaks, ...) # 36
lcl_freq <- table( lcl_cuts ) # 37
lcl_df <- data.frame( lcl_freq ) # 38
lcl_df$midpnt <- lcl_mid # 39
lcl_df$relfreq <- lcl_df$Freq/lcl_size # 40
lcl_df$cumulfreq <- cumsum( lcl_df$Freq ) # 41
lcl_df$cumulrelfreq <- lcl_df$cumulfreq / lcl_size # 42
lcl_df$pie <- round( 360*lcl_df$relfreq, 1 ) # 43
# 44
lcl_df # 45
} # 46
We can "walk through" the program to see what it does.
Lines 1 and 2
The first line assigns to the variable collate3 a
value. That value is defined to be a function.
The function is defined to have parameters
and in this instance the function will have
at least one parameter called lcl_list.
This means that when someone actually invokes
the function collate3() they will have to give it
at least one argument which will be assigned to the
parameter lcl_list. Thus, we could call the function
with a statement such as collate3(L1) and that
simple statement will start the function with L1
assigned to lcl_list.
The function heading continues, in line 1, by specifying
two more parameters, namely, use_low and use_width.
However, unlike the first parameter, lcl_list, these two
parameters need not be specified when we call the
function collate3() function. That is why we did not
need to have them when we used the command collate3(L1).
Here is an example use of the collate3() function that
uses these two additional parameters by specifiying two more arguments:
collate3( L1, 190, 10)
.
Such a statement will cause 190 to be assigned to use_low
and 10 to be assigned to use_width.
R even lets us change the order of the assignment as long as
the statement we use includes the names of the parameters.
Thus, the statement collate3( L1, use_width=10, use_low=190)
will cause 190 to be assigned to use_low
and 10 to be assigned to use_width, even though the
arguments are given in the opposite order.
However, if we do not specify arguments for use_low
and/or use_width then they will have the default
value specified in line 1 as NULL.
Finally, on line 1, the function heading includes an ellipsis, the three dots ...,
and they indicate that we could send even more arguments to this
function, but that at most we will be passing those
arguments on to some function that we may call from
within the body of this function collate3().
Thus, when we make the statement
collate3( L1, 190, 10, right=FALSE)
, R finds
no fault with the statement even though there is not a specific
parameter in the heading to match the argument
given as right=FALSE. As we will see below, in dicussing
line 36, the extra argument may be passed to yet another function.
Once the function heading is complete, with the closing right
parenthesis at the end of line 1, we need to specify the body of the
function. The body of the function will be enclosed in braces,
between { and a matching }. Line 2 holds our opening
left brace.
Please note that the location of that brace on a separate line
is just a matter of style. In fact, there are many people who
prefer to put that opening brace at the end of the heading, that is, at the end of line 1.
Such a choice is again just a matter of style, though that is a popular style.
Lines 3 through 14
These lines are either blank or they are comments.
Blank lines are helpful to a person reading the function because they
provide vertical spacing between logical portions of the function.
Comments start with the # symbol and extend from there to
the end of the line. The use of ## in this particular
function is again a matter of style. It helps to really set off
the lines as comments. However, the second # is not needed at all.
As noted at the start of this page, this particular listing actually has a comment on
every line because every line ends with the # followed by the line number.
This was done to make it possible to reference specific lines in the function.
Because the line numbers are in comments, the function even with the
additional line numbers will work just as well as it does without the line numbers.
The comments in these lines are meant to explain
what this function does. It is always nice if the comments accurately
describe the actions of the function, but, obviously, since these are
just comments, we could say anything here.
Lines 15 through 17
Understanding that any time collate3() is used it must be used
with an argument that is matched to the parameter called
lcl_list, these three statements set some local variables
to values representing the minimum value, the maximum
value, and the number of values in the calling argument.
Lines 19 through 29
These lines are all part of one statement. The statement starts
with the if in line 19.
That if is followed by a logical expression, enclosed
in parentheses, also on line 19.
If the value of the logical expression is true then
the single statement following the closing right parenthesis
will be performed.
In this case that single statement is the
block indicated by the opening left brace, {, on line 20,
and extending through the closing left brace, }, on line 28.
That block of statements is considered to be the one statement
following the if() even though there are multiple statements
inside the block.
Again, not that it is a matter of style to located the
opening left brace, {, on a separate line. A
different, popular style is to place that brace immediately after
the closing parenthesis of the logical condition.
In the logical structure of the collate3() function, these are
the statements that will take care of the situation where we do not
specify arguments that match lcl_low and lcl_width.
That is, when we call the function via the statement
collate3(L1)
these are the statements that provide
a response that tells us, the user, the lowest value, thi highest value, and
makes a suggestion as to the width to use.
On line 19, the structure
is.null(use_low) | is.null(use_width)
uses the vertical bar, |,
to represent the idea of or. Therefore, the meaning of that structure
is to produce a true value if we are missing either one or both of the
values use_low and use_width.
If the calling statement had provided both of the values, as in
collate3(L1,190,10)
, then that structure would evaluate as
false or false which is the value false.
All of this means that if we are missing one or both of those values
then the function will display (that is what cat() does)
the lowest value, the highest value, and then a
suggested width that is just one tenth of the difference between the
high value and the low value.
All of that is followed by a text message that the command should be given again
specifying more values, and a message that the system is waiting.
Finally, on line 28, we find the find the
return("Waiting...\n")
command.
The return causes R to leave the function,
ending any further computation within the function, and,
in this case, passing the value Waiting...\n back as
the final value of the function. In this the \n is the character
that ends a line of output and starts the next output text on
a new line.
Line 30
Line 30 is another comment line. In this case the comment serves
to let us know that they only way that we should be this far
into the function is if we actually have values
for use_low and use_width.
Lines 31 through 34
Our first task is to set up the partitions.
We know the low end of the partitions, it is in use_low,
and we know the width of the partitions, it is in use_width.
In order to use the seq() function to set up the
break points we still need to know the high value.
Line 31 computes the number of partitions we will need
to cover the range of values by finding the range between the
high data value, stored in lcl_real_high, and the
starting value for our partitions, stored in use_low,
dividing that range by the width of the partitions, using
the function floor() to drop any decimal part of that answer,
and then adding 1 to replace that dropped decimal part
with a whole interval.
Line 32 can then compute the maximum value that we will use in
our partitons. It does this by starting at the low
value and
then adding a width value for every partition that we will use.
Line 33 uses the seq() function to actually create
the break points.
Line 34 uses a slight modification of that same seq() function
to find the midpoints of each of the partitions.
It does this by starting the sequence half an interval width
above the low end and ending the sequence half an interval below
the top end.
Of course, the width between midpoint values is the same as the
width of the intervals.
Line 35
Line 35 was left blank to add a little vertical spacing and
to separate the preparation work from the rest of the work of the function.
Again, this is a matter of style. It does not hurt R
to have such vertical spacing and it can be a big
help to a person reading the function.
Lines 36 through 43
This section is the real work of the function. Here the
function cuts up the data, finds the frequencies, creates
the data frame, and finds and appends more values to
that data frame.
Line 36 uses the cut() function to create the
partitions that correspond to each of the data values
in lcl_list based on the breaks that were
calculated back in line 33.
In addition, because the function call ends with the
ellipsis, if the function collate3()
was called with extra arguments, such as right=FALSE,
then any such extra arguments are passed on to this
call of the cut() function.
Once the partitions have been created, line 37 uses them
to create a table structure to hold the frequency of each
partition which is the same as holding the frequency
of the values in lcl_list that fall into each of the intervals.
Remember that the table structure also holds the
different values of the partitions, such as (210,220],
as the labels for the frequencies.
The next step is to convert the table structure that we
had stored in lcl_freq into a data frame.
Line 38 does this, creating the variable lcl_df.
Lines 39 through 43
merely add more columns to the data frame.
First we append the already computed midpoints.
Then we compute and append, in order, the relative frequency,
the cumulative frequency, the relative cumulative frequency,
and the number of degrees to allocate if we were to
have to draw a pie chart.
Line 44
Just another blank line that provides a little vertical spacing.
Line 45
Line 45 could have been written as return( lcl_df )
but because it is the last line to be performed in collate3()
writing it as just lcl_df
does the same thing.
This last expression becomes the value of the function,
the result of the calling the function. Thus,
if we had called the function with a statement such as
collate3(L1,190,10)
the result
would be the value of the data frame that we created
in the function. As such, that value would be displayed in the Console.
On the other hand, if we had called the function with a statement such as
df <- collate3(L1,190,10)
the result
would still be the value of the data frame that we created
in the function but that value would be assigned to the variable df.
Line 46
This is the closing brace, }, that matched the
opening brace found back on line 2. This marks the end of the
block statement that serves as the one statement
following the function header.
Return to Computing in R: Frequency Tables -- Grouped Values
©Roger M. Palay
Saline, MI 48176 November, 2015