Return to Topics page

We know that the

- has distinct values, and not a huge number of distinct values, so that it is hard to believe it is continuous,
- is clearly not symmetric,
- has more than one modal region,
- has excessively many or extreme outliers, or
- does not have a linear
**quantile plot**

First, consider an example that has too few different values to be called continuous.

The sample in

Small samples mean that we just do not have enough data to claim that something is

Our next sample is shown in

We have plenty of values here. How do we look at things like symmetry, modal regions, and outliers? We have some tools to help, namely, dot plots, box and whisker plots, and histograms. We can do each of these for the data in

We recall from our page on making dot plots that R has no built-in command to do dot plots. Instead, we developed our own function to make these plots. A listing of that function is included here for your convenience.

dot_plot<-function( this_list, ... ) { ## the first thing to do is to just sort the list into a local copy lcl_list <- sort( this_list ) ## then we want a second list that is just as long as was the ## original list, because, in that second copy we will place the ## vertical position of the associated value in the sorted copy lcl_count <- lcl_list ## then, to start, we begin at the first item in the sorted list ## It will have avertical position of 1 cur_val <- lcl_list[1] m <- 1 lcl_count[1]<-1 ## now we just move through the rest of the sorted ## list and if we are at the same value then we go up one ## vertical level, but if we are at a new value we reset ## the vertical position to 1 for (i in 2:length(lcl_list)) { x <- lcl_list[i] if ( x==cur_val ) { m <- m+1 lcl_count[ i ] <- m } else { cur_val <- x m <- 1 lcl_count[i] <- m } } ## once we are done with that, we can just do a scatter plot on ## the two vectors that we have created. plot( lcl_list,lcl_count, xlab="", ylab="Frequency", ...) }Once the function is defined in our R session we can use it with the data in L1, the data values generated by the

The result of the

That dot plot does not indicate that there is any problem with symmetry. There do not seem to be any outliers. The dot plot does raise questions about the sample in

A

Unfortunately, we do not learn much from this

The

Our last approach to this question is the

The idea is that if

We could construct our own list where we are sure that the values correspond to the density of the normal distribution. The

Finally, if the values in

n<-length(L1) # find the number of items q <- 2*n # make q be twice that number p <- seq(1, q-1, 2 ) # make a sequence of the numerators L2 <- p/q # make a list of the values L3 <- qnorm( L2 ) # get a list of z-scores sorted_data <- sort(L1) # get a sorted version of L1 plot(sorted_data, L3) # make the plotFigure 5 shows the console view of the statements.

Figure 6 shows the resulting plot.

Our expectation, if

First we use

This shows a skewed distribution with a heavy count of items at the left end, meaning a long tail on the right: skewed to the right. We will use

Figure 8 confirms the view that the data values in

The histogram shows the skewed distribution too.

Next we want to look at the

assess_normality <- function( data_list ) { n <- length( data_list ) sorted_data <- sort( data_list ) q <- 2*n p <- seq(1, q-1, 2 ) L2 <- p/q L3 <- qnorm(L2) plot( sorted_data, L3, ylab="z values" ) }and Figure 10 shows the console view of that function definition followed by our first use of the function.

The resulting graph is shown in Figure 11.

We already knew that the values in

The

Those location of those dots look to be symmetric (within reason). There do not seem to be any

We look at the

This too looks completely

We check the

Again, we find nothing to suggest that this is not a

Move to do a

Finally, we have a collection of data where the dots all nearly fall on a diagonal line. This is a confirmation that the distribution is

The

We have a problem here. There seems to be two modal areas. We will see if there confirmations from the other views.

The

Figure 17 does not show any problems. We move to look at the histogram in Figure 18.

Figure 18 shows the two modal regions. This does not conform to the

The

Indeed, the dots in Figure 19 do not fall on a diagonal. The data in

The

The distribution seems to be heavy around about 475 without having that be the central modal area. Also, the extreme low value is a concern. More views are needed.

We check out the

Figure 21 shows two clear

We turn to the

Figure 22 echoes the concerns that we have expressed above.

Finally, we look at the

The fact that the dots do not come close to being on a diagonal line confirms our view that the values in

source( file="http://courses.wccnet.edu/~palay/math160r/gnrnd4.R") gnrnd4( key1=745122201, key2=200056 ) gnrnd4( key1=2344122201, key2=20005600 ) gnrnd4( key1=357848501, key2=15200083 ) dot_plot<-function( this_list, ... ) { ## the first thing to do is to just sort the list into a local copy lcl_list <- sort( this_list ) ## then we want a second list that is just as long as was the ## original list, because, in that second copy we will place the ## vertical position of the associated value in the sorted copy lcl_count <- lcl_list ## then, to start, we begin at the first item in the sorted list ## It will have avertical position of 1 cur_val <- lcl_list[1] m <- 1 lcl_count[1]<-1 ## now we just move through the rest of the sorted ## list and if we are at the same value then we go up one ## vertical level, but if we are at a new value we reset ## the vertical position to 1 for (i in 2:length(lcl_list)) { x <- lcl_list[i] if ( x==cur_val ) { m <- m+1 lcl_count[ i ] <- m } else { cur_val <- x m <- 1 lcl_count[i] <- m } } ## once we are done with that, we can just do a scatter plot on ## the two vectors that we have created. plot( lcl_list,lcl_count, xlab="", ylab="Frequency", ...) } dot_plot(L1, ylim=c(0,14)) boxplot(L1, horizontal=TRUE) hist(L1) n<-length(L1) # find the number of items q <- 2*n # make q be twice that number p <- seq(1, q-1, 2 ) # make a sequence of the numerators L2 <- p/q # make a list of the values L3 <- qnorm( L2 ) # get a list of z-scores sorted_data <- sort(L1) # get a sorted version of L1 plot(sorted_data, L3) # make the plot gnrnd4( key1=734054702, key2=13900145 ) dot_plot(L1, ylim=c(0,14)) boxplot(L1, horizontal=TRUE) hist(L1) assess_normality <- function( data_list ) { n <- length( data_list ) sorted_data <- sort( data_list ) q <- 2*n p <- seq(1, q-1, 2 ) L2 <- p/q L3 <- qnorm(L2) plot( sorted_data, L3, ylab="z values" ) } assess_normality(L1) gnrnd4( key1=236389404, key2=0001100438 ) dot_plot(L1, ylim=c(0,14)) boxplot(L1, horizontal=TRUE) hist(L1) assess_normality( L1 ) gnrnd4( key1=227358705, key2=1800215, key3=2100290 ) dot_plot(L1, ylim=c(0,14), las=2) boxplot(L1, horizontal=TRUE) hist(L1) assess_normality( L1 ) gnrnd4( key1=830577509, key2=550819432219 ) dot_plot(L1, ylim=c(0,14), las=2) boxplot(L1, horizontal=TRUE) hist(L1) assess_normality( L1 )

©Roger M. Palay Saline, MI 48176 January, 2016