Here is another typical textbook linear regression problem. Table 1 gives 30 pairs of values, an X value and a Y value in each pair. Thus, the first pair of values is the point (9.6,

We will follow the usual procedure for doing our work on the USB drive. In particular, we have

- inserted our USB drive,
- created a directory called
on that drive`worksheet06`

- have copied
from our root folder into our new folder,`model.R`

- have renamed that new copy of the file to the name
, and`ws06.R`

- have double clicked on that file to open
**RStudio**.

Our next step is to generate and verify the data that we were given in

The

Next we will get a quick plot of the points.

There is not much to see in the

But the

That plot certainly suggests a strong linear relation between the values in

`L1`

`L2`

Running those commands produces the much nicer looking plot shown in Figure 8.

Then we can look at the

`lm(L2~L1)`

Running those commands produces the rudimentary output shown in the

The information in Figure 10 is enough for us to fomulate the

`y = 8.205 + (-2.008)x`

`lm_hold`

Of course, this does not produce much at all in the

But the graph, in the

Where Figure 10 showed only the most essential information about our

`summary(lm_hold)`

The result of running that command is shown in Figure 15. Here we get information on the

The commands shown in Figure 17 allow us to look at an, eventually use, the

Looking at Figure 17 and refering back to Figures 10 and 15, we can see that we have the

One such use would be to find the values that we would

To do this we can put those

`x_vals`

Running those commands produces the output shown in Figure 19.

The interpretation of those values is

- if
**x = -4**, then the regression equation expects**y to be 16.236643**, - if
**x = 5**, then the regression equation expects**y to be -1.835156**, - if
**x = 6**, then the regression equation expects**y to be -3.843134**, and - if
**x = 16**, then the regression equation expects**y to be -23.922911**.

`y = 8.205 + (-2.008)x`

Rather than just find the four points given above, we could use the same tehnique to find the

Figure 21 show the results of running those commands.

But, since we have the

`L2-all_pred_y`

Figure 23 gives the

Of course, we did not have to do any of that computation. The linear model,

`lm_hold`

`residuas(lm_hold)`

The values are seen in Figure 25. There each value is shown with its sequential number, but the residual values displayed are the same as those that we computed and saw in Figure 23.

The reason that we want the

`L1`

The desired plot is shown in Figure 27. The

Recall that in Figure 15 we saw that the

`cor(L1,L2)`

The result, shown in Figure 29, gives

That is enough for this example. We note that we have yet to save our

`ws06.R`

We click on the icon to save the file, turning the file name use black letters (Figure 31).

In the

`q()`

`y`

Then press ENTER to terminate the

`ws06.R`

`.RData`

`.Rhistory`

Here is a listing of the complete contents of the

`descriptive.R`

#This is used for worksheet 06 # load the gnrnd4 function and then run it to get our values source("../gnrnd4.R") gnrnd4( key1=6479962906, key2=3170200704, key3=34700085 ) # and, of course, look at the generated values L1 L2 # the values look good. # generate a quick and dirty scatter plot plot(L1,L2) # we can improve that a bit and draw in some grid lines # along with the two axes via plot(L1, L2, main="For Worksheet 6", xlab="x values", ylab="y values", las=1) abline(h=0,lwd=2, col="black") abline(v=0,lwd=2, col="black") abline(h=seq(-40, 20,10),col="dark grey", lty="dotted") abline(v=seq(-5, 25, 5), col="dark grey", lty="dotted") # the graph certainly suggests a linear relation # We will have R do the computation lm_hold <- lm(L2~L1) lm_hold # we can even add that line to our previous plot abline(lm_hold, col="red", lty="solid", lwd=2) # we can get a slightly more in-depth look at the linear model summary( lm_hold ) # rather than type in the displayed valeus of the # intercept and the slope, we can pull them out of the model coeffs <- coefficients( lm_hold ) # and we can look at them together coeffs # or separately coeffs[1] coeffs[2] # That means that we can answer questions such as "What is the # predicted y value when x is -4, 5, 6, and 16?" x_vals <- c(-4, 5, 6, 16 ) pred_y <- coeffs[1] + coeffs[2]*x_vals pred_y # We could even get all of the predicted (expected) values # from each of the x-values that we had in the table. # They are in L1 all_pred_y <- coeffs[1] + coeffs[2]*L1 all_pred_y # Then, we could compute all of the residual values, that is the # observed-expected values all_resid <- L2-all_pred_y all_resid # Of course, we really did not need to do all that work to get # the residual values because we can extract them from the # linear model residuals( lm_hold ) # However we obtain the residuals, we want to look at the plot # of the x-values and their associated residual values to # be sure that they are scattered around. plot(L1, all_resid ) # one other thing that we may want to do it to find the # correlation coefficient for this model cor(L1,L2)

©Roger M. Palay Saline, MI 48176 February, 2017