Worksheet 03: Descriptive Measures
Return to Topics page
Here is the story!
I wanted us to have some real data to use for this example.
So, in my copious free time I went for a walk and looked for something that I could
bring back to the class.
Standing along Platt Road in Ann Arbor, just south of the intersection with
Washtenaw Avenue, I observed that the northbound cars on Platt
made one of five choices ahead of me:
This is shown in Image 1.
- turn into the Washtenaw County Rec Center,
- make a left turn onto Washtenaw (westbound),
- cross the street and continue onto
- make a right turn onto Washtenaw (eastbound), or
- turn right into the Arbor Hills shopping center.
I decided to make a short study of the number of cars making each choice.
I assigned numbers to each choice (1=Rec Center, 2=west on Washtenaw, 3=onto Glenwood,
4=east on Washtenaw, and 5=shopping center).
As each car came up the road I wrote down the choice for that car. Thus, I generated a
collection of 98 values.
Here is the record of the
first 98 cars that I saw coming up Platt.
I want to get a feel for this data. just looking at the values in
Table 1 does not help. I need some descriptive measures of this
I decide to use R to do this work.
I create a new folder (directory) on my USB drive, rename it, copy
model.R file to it, and then rename that file, in this case to
descriptivep2.R. That is shown in Figure 1.
I double click on the
descriptivep2.R file name to open
my RStudio session.
Then I need to get the data from Table 1 into my RStudio session.
Thankfully, there is a setting for the
that will produce exactly those values!
Therefore, I wrote the comments and commands shown in Figure 2.
[Disclaimer: I sure hope that you realize that
this whole story is bogus. I coud have actually stood out there and found the real
values but then we would
have had to find a way to get the values to you so that you could read them into
your RStudio session. Instead, I generated the table using gnrnd4 on my
machine and then I reported the numbers as if I had actually observed them.]
Running the highlighted lines produced the
output shown in Figure 3 in the Console pane.
A quick check of the values shown in Figure 3 against the values in
Table 1 shows that we do indeed have the correct values.
A look at the Environment tab, shown in Figure 4,
further confirms that the commands did what we wanted done.
We can even see that there are indeed 98 values in
I know that the data in
L1 is nominal data.
The values, 1 2, 3, 4, and 5 are merely names for the choices
that the drivers made. Those values are not measurements.
I could have assigned different values to each of the choices.
There is no underlying order to the choices.
Therefore it makes no sense to compute the mean,
median, quartiles, or even the range for these values.
What would be helpful is to know the mode of the values.
We know that we have to load the
it is not a built-in function. That gives rise to the new lines shown in Figure 5.
Running the hghlighted lines of Figure 5 gives us the output in
Now we know that the value 2 happened more often than did any other value
in the collection, and in fact, there are 49 2's in the table.
A quick glance, Figure 7,
at the Environment area confirms that the
function has been loaded.
What would be really informative is to have a frequency table
for the data in the collection.
We recall that we have the
that will make such a table. The commands in Figure 8 show
how to load and then run that function.
Running those lines does not produce much in the
Console pane, Figure 9.
That is because we assigned the output of the
function to the variable
Looking at the Environment pane
shows us that there is a new variable there.
To see the contents of that variable we just need to add a line
that gives the variable name,
ft as shown in Figure 11.
When we run that line we get the result shown in Figure 12.
Now we have the count for each of the five alternatives,
and we even have the relative frequency of those choices.
[The cumulative values are meaningless because there
is not a real underlying order to the values.]
We even have, should we need it, the number of degrees
for the central angle for each piece of a pie
chart should we have to make such a chart.
As we have seen before, we can get a slightly more elegant version of that
table by using the
The command is recorded in Figure 13.
Ir we run it we get very little in the
Console pane, shown in Figure 14.
However, we do get a new tab in the Editor pane, shown in Figure 15.
Note that the image in Figure 15 has been augmented with a green oval.
Earlier web pages have talked about the added power of the
generated display, an example of which is in Figure 15.
If you point to and click on the area in th title bar shown in the
green oval of Figure 15 the system will sort the values in that
column and maintain values in corresponding rows.
That is how we get to Figure 16, just click on th region highlighted in Figure 15.
Now, in Figure 16, all of the rows have been sorted according to the
increasing value of the items in the column that we selected.
If we click on that region again we get the values sorted into decreasing order,
as shown in Figure 17.
The appropriate graphic representation of the values in the data collection
is a bar plot. We recall that the
creates a bar for every value in the list that we send to it.
Thus, we would not want to use
L1 as an argument
for the function. If we did we would get a plot with 98 bars!
Our goal is to have the height of the each bar represent the number of
each different value in
We can use the
table function to get those counts
and then pass the results of that fnction to the
This is shown in the highlighted command in Figure 18.
Of course there is not much to see in the Console pane, as shown in
barplot function creates a graph in the
Plots tab, shown in Figure 20.
There is nothing wrong with the image shown in Figure 20.
In fact, it shows exactly what we
want to see, the distribution of values in Table 1.
It might be nice to make the image a bit more pretty.
Here are some steps that we might take to do that.
We start by just saving the results of the
command in a new variable,
Once that variable has been completed we can give
text names to each of the values in
Then we can generate a new bar plot. The commands to do this are
shown in Figure 21.
As we have seen before, the Console record just
notes that the commands have been
given and no error has been detected.
The new plot, ahown in Figure 23, now has
labels for the bars that make much more sense than did
1, 2, 3, 4, and 5.
There are more commands to pretty-up the plot.
A new version of the commands appears in
Running those commands produces the plot seen in
That is really more than we had to do, but it looks nice.
All that we have to do now is to wrap up our session.
In the Editor pane we click on the tab
for our command file,
Then we want to click on the icon
to save the file, and turn the name back to a black font.
Then we use the
q() function and we respond
to the propt with
to close our session.
That should leave us still looking at the folder we have been using.
there we should be able to see our updated files (unless this is being
done on a Mac which hides the
Here is a listing of the complete contents of the
#This session will create a collection of data
# and then find the appropriate descriptive
# measures for that data. From the problem statement
# and the given data we know that we are using
# NOMINAL data.
#first we will load the gnrnd4 function that we
# will need here. More may follow later.
# Now we are ready to create the data collection
gnrnd4( key1=294169707, key2=321925)
# Then we can look at the data that we generated
# Our first task is to find the mode of the data.
# It sure looks like there are a lot of 2's, but we
# will load and use the Mode function to get a good
# count of this.
Mode( L1 )
# To get a more complete description of this data
# we can use the make_freq_table function. Of course,
# we need to load it first
ft <- make_freq_table( L1 )
View( ft )
# For a picture of the data we turn to a bar plot.
# But remember that the bar plot will generate bars that
# that have heights (lengths) corresponding to the
# values that we give it. Therefore, we just want the count
# of each of the different values in the data
barplot( table( L1 ) )
# We could make the plot a bit more informative
# First we can get the counts into a separate variable
counts <- table( L1 )
# Then we can assign names to those counts
names( counts ) <-c("Rec Center","west Wash.","Glenwood",
# Checkout the new plot
barplot( counts )
# then make it just a bit fancier
barplot( counts, main="Traffic Northbound on Platt",
ylab="Number of cars obsevered",
xlab="Choices for turns",
col=rainbow(5), las=1 )
abline( h=seq(0,60,5), col="darkgrey", lty="dotted" )
Return to Topics page
©Roger M. Palay
Saline, MI 48176 September, 2016