The task here is to come up with descriptive measures for the data in Table 1.

Assuming that you have read through earlier pages and that you have mastered many of the steps that we use to set up our work, you can skim through the first ten Figures and their associated discussions.

We start by inserting our special USB drive. On the computer used for this demonstrations that drive was asigned the letter

Then we create a new folder, on that drive. To do this here I just clicked on the New Folder icon toward the middle top of the window of Figure 1. The result is shown at the bottom of Figure 2.

Rather than accept the default folder name of

`New folder`

, we give it a new name, in this case
`worksheet031`

Once the folder is there we want to copy the

`model.R`

file that is on the USB drive into the folder.
First we right click on the `model.R`

file.
This opens the options shown in Figure 4, where we point to the `Copy`

option and then click on it.
Next we double click on our new folder name,

`Worksheet031`

,
to move into that folder, shown in Figure 5.
We can see, in Figure 5, that the folder is empty. But now we can right click in the folder and select the

`Paste`

option.
That puts the copy of `model.R`

into this folder.
However, we have learned that it is probably best to rename this file. To do that we click (just once) on the name of the file. That will allow us to edit the name of the file, as shown in Figure 7.

For this project we will use the name

`ws31.R`

. We change the file name to that,
as we see in Figure 8.
And we can press the

`Enter`

key to move to Figure 9.
At this point we have our new folder and in that folder we have a renamed copy of our model file. We can double click on that file name to open a session of

An important relation her is that because we started this session of

We also note, because we have not done any work in this currect directory, we have a blank

`source`

command, into the We will consistently type commands into the

Certainly, you may decide to type all of the commands (and hopefully the comments) yourself as you follow along. However, all of the commands used in this page have been provided in machine readable form at the bottom of this page. If it were me doing the work I would find that listing, copy the lines from this page and paste them into the editor. Then, to follow along, just highlight the lines that you wish to execute. |

We were given values above in

`gnrnd4`

function
into our environment and then run the function using the values given
with our table above.
Once generated it makes sense that we look at the values
so that we can compare them to the values in When we run those commands we get the lines shown in Figure 12 in to

If we look at the

`L1`

We recall that the built-in function

`summary`

`sd(L1)`

`pop_sd`

We run the commands highlighted in Figure 14 to get the

We read the output in Figure 15 to find that

In all of this we might be a little concerned that the

Running those commands produces the output in Figure 17. There we see the value of the

However, if we look in the

`xbar`

The 15 digits shown in Figure 18 is clearly more than we need. The 7 digits shown in Figure 17 is quite helpful. The 4 digits shown in Figure 15 do not really give us enough information, but the

`summary`

There is a way to tell R to give us more digits in the display by default. We can use the

`options(digits=10)`

`summary`

The result of running those two commands is in Figure 20. There we see that all of the values are given with more displayed digits.

One might reasonabley ask, "I set digits to 10, why are there only 7 shown for each value in Figure 20?" The answer is that the setting,

`digits=10`

At this point we have found our

When we run the command of Figure 21 we get the

The graph in Figure 22 is completely accurate, but it is a bit hard to understand, in part because we have no idea of where R has decided to make the breaks for the different

For the purpose of this course, the default graph is good enough. However, with just a few additional values we can really improve the quality of the graph. Three immediate changes will be:

- to include the wierd
option which will cause all of the axis values to appear`las=3`

**perpendicular**to the axis, - to include the option
which will set the breaks for the bins to start at 110 and to be 15 units wide and to not go over 350, and`breaks=seq(110,350,15)`

- to include the option
which will set the x-axis labels to start at 110 and go to 350 and to have 16 even steps along the way, thus coinciding with the values for our break points.`xaxp=c(110,350,16)`

Running the command does not produce anything to speak of in the

However, it does produce the histogram shown in Figure 25, a significant improvement over the default graph of Figure 22.

One aspect of the histograms that we have yet to examine is where do you place a value that is on a break point. In the data in

We can change this default by including the option

`right=FALSE`

As expected, running the command does little but echo it n the

But runing the command does produce the new histogram shown in Figure 28.

We can see some change from Figure 26 to Figure 28. In particular, the

There are a few more tweaks that we could add to our histogram, just to make it easier to read:

- the option
give the graph a main title,`main="For Worksheet 3.1"`

- the option
replaces the default label for the x-axis,`xlab="Table Values"`

- the option
replaces the default label for the y-axis,`ylab="count of values"`

- the option
explicitly sets the x-axis to that range,`xlim=c(110,350)`

- the option
explicitly sets the y-axis to that range,`ylim=c(0,16)`

- the option
causes the labels for both axes to be given at 70% the regular size,`cex.axis=.7`

- the option
makes the tick marks on the y-axis start at 0, end at 16, and have 8 steps along the way, and`yaxp=c(0,16,8)`

- the additional command
will write over the histogram dotted blue horizontal lines starting at 2, ending at 16, and going in steps of 2.`abline( h=seq(2, 16, 2 ), col="blue", lty="dotted")`

Running those commands changes the

And, doing so produces the graph shown in Figure 31.

It would be inapproprite to use a

`L1`

`barplot(table(L1))`

As usual, not much shows up on the

The generated

We do remember that to find the

`L1`

`Mode(L1)`

The result of running the two commands is shown in Figure 36 where we see that there are two

We can move on to explore a

`summary`

`boxplot`

`horizontal=TRUE`

We see the output of the

`summary`

The

`boxplot`

Again, as was the case with the

Figure 41 shows the resulting

Figure 42 shows the new graph. It is much easier to approximate the various values by reading this chart than it was by reading the default chart shown in Figure 39.

Another way to look at the data is via a

`stem`

The result is shown in Figure 44 below.

Figure 44 looks like a

The problem is that the built-in function groups values according to its own need. For the

`stem`

Running the command of Figure 44.1 produces the diagram shown in Figure 44.2 where we now find the complete set of values. Note that 257and 277 are both in the new diagram.

Recognizing the default strange behavior of the built-in command, we do have a different function that you can load and run, namely,

`stem_leaf`

Running those commands produces the diagram shown in Figure 46.

It is worth comparing these two solutions. Each has its advantages.

That leaves us with attempting a

The

The

That plot is not particularly interesting. In fact, it is no more informative than was the

Before we leave we should make sure that we have saved our

By clicking on the floppy disk icon, , we save the file and turn its name to a black font as in Figure 51.

Finally, we need to close our

`q()`

`y`

Here is a listing of the complete contents of the

`ws31.R`

#we want to use gnrnd4 to generate our data # first we need to get that function into our # environment source("../gnrnd4.R") # Then we can use it to generate our data gnrnd4(503029604, 5200234) # and we should at least look at it so that we can # compare it to the table on the web page L1 # Then we can get some descriptive measures summary(L1) # along with the sample standard deviation and variance sd(L1) sd(L1)^2 # and then the population standard deviation and variance source("../pop_sd.R") pop_sd(L1) pop_sd(L1)^2 # The summary command did not give us many # significant digits. Look at it another way... xbar <- mean(L1) xbar # Here is another way to get more out of commands # like summary options(digits=10) summary(L1) # If we look back at the data we see that we have many # different values ranging from 121 to 344. Let us get # graphs of those values. hist(L1) # That was a histogram with the various parameters set # by R. We could, of course, specify some of them # so that we get a nicer histogram. This is not # a requirement of our course but it does not hurt to # learn a little extra... hist( L1, las=3, breaks=seq(110,350,15), xaxp=c(110,350,16)) # Note that these intervals are closed on the right # but we can change that with hist( L1, las=3, breaks=seq(110,350,15), xaxp=c(110,350,16), right=FALSE) # And, returning to the closed on the right intervals, # we could fix up a little more with hist(L1, main="For Worksheet 3.1", xlab="Table Values", ylab="count of values", xlim=c(110,350), ylim=c(0,16), breaks=seq(110,350,15), xaxp=c(110,350,16),las=3, cex.axis=.7, yaxp=c(0,16,8)) abline( h=seq(2, 16, 2 ), col="blue", lty="dotted") # we could try to do a barplot, but remember that a barplot # will show the count (i.e., frequency) of each unique value barplot( table(L1) ) # Looking at the bar plot we can see that there a a number of # values that repeat, 12 values appear twice, 5 values appear # three times, and two values appear 4 times. It is hard to # read the plot to find those two values, but we could "load" # and run the Mode function to help us. source("../mode.R") Mode(L1) # while we are at it, let us see, again the summary values # and then generate a box and whisker chart summary(L1) boxplot(L1, horizontal = TRUE) # and it would not hurt to try to make that look a little # better boxplot(L1, ylim=c(110,350), xaxp=c(110,350,16), horizontal=TRUE, main="For Worksheet 3.1 Data", las=3) abline( v=seq(110, 350, 15 ), col="blue", lty="dotted") # we can try a stem and leaf plot. After all, we have values # running from 121 to 344 and we could look at stems # running from 12 to 34. # first we can try this with the built-in stem function stem(L1) # that produced some questionalbe output since, looking # at the data we do not find a 123 but we do find a 133 # which is not in the stem-leaf plot. There is an option # that may help this: stem(L1, scale=3) #And then we can load our version and try that... source("../stem_leaf.R") stem_leaf( L1, place=0) # we can also do a dot plot, but that will not be much # different from that bar plot that we did before. source("../dot_plot.R") dot_plot(L1)

©Roger M. Palay Saline, MI 48176 January, 2017