The task here is to come up with descriptive measures for the data in Table 1.

This page assumes that you have read through earlier pages and that you have mastered many of the steps that we use to set up our work. [If that is not the case then you should go back to the earlier pages where we walk through all of those steps.]

We start by inserting our special USB drive. On the computer used for this demonstrations that drive was asigned the letter

`worksheet02`

`model.R`

file that is on the USB drive.
We open the new folder and paste the copy of the file into the new folder.
Finally, we rename the file; in this instance it has been called
`descriptive.R`

Then, double click on the

`descriptive.R`

`descriptive.R`

Just thinking about the task at hand, we now that we will need to load a few of the functions supplied on the USB drive. Figure 2 shows the comments and commands written into the editor to start our project. A complete listing of the commands that we will use in this demonstration is given at the end of this page.] Note that those lines have been highlighted. We click on the icon to have those highlighted commands prformed.

Note that those lines have been highlighted. We click on the icon to have those highlighted commands prformed. The result is shown in Figure 3.

With no error messages in Figure 3 we assume everything went well. However, we can check in th

Back in the

Those lines were highlighted in Figure 5. We run them to get the output shown in Figure 6.

We make the essential check that the numbers we show in Figure 6 are identical to those shown in

They are so we can move forward. The

`summary()`

`summary()`

Once those lines are run we get the output shown in Figure 8.

We see that

- the
**mean**of the values in the table is 84.82, - the
**median**of the values in the table is 82.70, so half the values are less than 82.70 and half of them are greater than 82.70, - the
**range**of the values in the table is from 56.30 to 117.30, - the
**first quartile, Q**, of the values in the table is 74.65, and_{1} - the
**third quartile, Q**, of the values in the table is 96.30._{3}

We can compute the

The result of running the commands is in Figure 10.

From that output we see that

- if this is a
**sample**then the**standard deviation**is 13.45017, but - if this is a
**population**then the**standard deviation**is 13.37919.

So much for the measures describing the data. Let us turn to a picture of the data. The function

`hist()`

Running the highlighted text of Figure 11 does not produce an image in the

However, it does create the histogram shown in Figure 13 in the

The histogram gives us quite a good feel for the distribution of values in the data collection. We can see that there are just a few extreme values and that most of the values are clustered toward the middle. Looing back at the

A

`boxplot()`

Aagain, running the highlighted text of Figure 14 does not produce much in the

But it does produce, in the

In Figure 16 we see the range of the values, the location of the first and third quartiles, and the position of the median.

The vertical alignment of Figure 16 is merely the default alignment. A horizontal alignment is often more pleasing. With just the small change shown in Figure 17 we can ask for the alternative arrangement.

Figure 18 shows the horizontal

Having seen the

`collate3()`

Recall that the process for building the frequency table depends upon us specifying the lower end of the first "bin" and the width of the "bins". We wrote

`collate3()`

`collate3()`

The result of running those higlighted lines is given in Figure 20.

We, however, will construct our "bins' to conform to the "bins" that R used in making the

`collate3(L1,use_low=50,use_width=10)`

`ft`

`ft`

Running those highlighted statements produces the output, in the

In that

The table of Figure 22 gives all of the inforamtion that we have, but it is not pretty. Figure 23 shows the command,

`View(ft)`

Running the highlighted lines of Figure 23 produces no new output in the

However, it does create a new tab in the

Figure 25 does not contain any more information than did Figure 22. It just looks much better. [And, as was noted in the earlier introductory pages, there are some extra features available in this new presentation, though we will not present them again here.]

Before leaving this description of the data in

Figure 26 shows the enhanced and expanded commands.

Once we run the commands highlighted in Figure 26 the

The enhanced histogram appears in th

Before we leave all of this we should save the editor file. We click on the

`descriptive.R`

Then we use the

`q()`

`y`

At that point, when we look at the folder, shown in Figure 31, we should see not only our much larger

`descriptive.R`

`.Rdata`

`.Rhistory`

Here is a listing of the complete contents of the

`descriptive.R`

#This session will create a collection of data # and then find the appropriate descriptive # measures for that data. #first we will load a few of the functions that we # will need here. More may follow later. source("../gnrnd4.R") source("../pop_sd.R") source("../collate3.R") # Now we are ready to create the data collection gnrnd4( key1=1300259404, key2=13200853) # Then we can look at the data that we generated L1 # Looking at the data it appears that these # are continuous measures and that there is # little if any repetition of values. Thus the # appropriate measures of central tendency are the mean # and the median. We will look at the range and the # quartile points as measures of dispersion. All of that # is done via the summary() function summary(L1) # We also want to find the standard deviation of the # data, but to do this we need to know if this data # is a population or a sample. For a sample we can use # the built-in function sd() sd( L1 ) # For a population we will use the function that we # already loaded, pop_sd() pop_sd( L1 ) # To get some graphic description we can generate a # histogram, the most basic form of which is created # by the function hist() hist( L1 ) # We can get a box and whisker plot of the data via # the function boxplot() boxplot( L1 ) # not that we need to do it, but we could turn this into a # horizontal form via a slightly modified use of that # same function boxplot( L1, horizontal=TRUE ) # and then, to follow up on the histogram, we could # produce a frequency table of grouped data by # using the collate3() function that we loaded earlier. # We already know the low and high values of the data # from our earlier summary() function. Therefore, we # could skip the first step in running collate3(). # However, in this example, we will do the full two # step process. First run the function collate3() but # with only the one argument, L1. collate3( L1 ) # Now run it again, but this time we will tell it to # use 50 as the low value, and we will set the width # of the intervals to be 10. Also, we will save the # result in a new variable, ft ft <- collate3( L1, use_low=50, use_width=10 ) # to see the result we can just give the variable name ft # We could get a nicer looking version of that output # by using the View() function, note the capital V View( ft ) # it might be nice to return to our histogram and improve # it by changing the labels, changing the y-scale, and # then adding some horiontal lines. This will make it # easier to compare the histogram to the frequency table. hist( L1, main="Worksheet #2 Histogram", xlab="values in table", ylab="Frequency of values", ylim=c(0,30), yaxp=c(0,30,10), las=2) abline(h=seq(0,30,3), col="darkgrey", lty="dotted")

©Roger M. Palay Saline, MI 48176 September, 2016