Stem and Leaf Plot
Return to Topics page
Stem and Leaf Plots were quite helpful, in some cases, when we had to
organize data by hand. The concept is best illustrated. Consider the data in
Table 1
A quick read of the data, just looking for the lowest and highest values
suggests that the range of values is from the 320's to the 370's.
We create a list of stems from 32 to 37 as
Then, we read the values in the table, one at a time,
and for each value we find the stem that has the first two digits of the value.
So for 371 we locate stem 37:.
We write the units digit of the value, in the case of 371
that would be 1, after the stem.
Now our diagram becomes
We move to the next item in the table, 354.
For that value we append the units digit 4
to the stem 35:. The diagram is now
32:
33:
34:
35:4
36:
37:1
|
We move to the next item in the table, 352.
For that value we append the units digit 2
to the stem 35:. We already had 35:4 there
so that becomes 35:4 2. The diagram is now
32:
33:
34:
35:4 2
36:
37:1
|
We keep building the diagram, appending the units digit 4
from 344 to the 34: stem,
appending the units digit 7
from 337 to the 33: stem, and
appending the units digit 4
from 354 to the 35: stem which now becomes
35:4 2 4 and the diagram becomes
32:
33:7
34:4
35:4 2 4
36:
37:1
|
This simple methodical process, when finished,
produces the diagram
32: 3 9 8 7 7
33: 7 9 4 9 8 6 1 9 7 8 7 3 8
34: 4 5 3 6 3 8 4 1 3 1 2 8 8 7 4 9
35: 4 2 4 9 0 4 9 0 2 2 2 9 0 7 8 4 2
36: 8 0 7 3
37: 1
|
That result is an "unsorted" stem and leaf diagram.
It gives us a feeling for the distribution of the data.
In fact, it really gives us a "sideways" histogram of the data
with intervals [320,329), [330,339), [340,349), [350,359),
[360,369), and [370,379). Figure 1 shows the usual histogram
produced by the statements
gnrnd4( key1=121855504, key2=0001000345 )
hist( L1, xlim=c(320,380), breaks=seq(320,380,10),
ylim=c(0,20), yaxp=c(0,20,20), las=1,
cex.axis=0.6, right=FALSE)
Figure 1
The stem and leaf diagram from above has the same
number of bars, 6, and the length of the
stem and leaf bars, the number of units digits in each bar,
is the height of the corresponding bar in Figure 1.
The additional feature of the stem and leaf diagram
is that we could actually move back from the
diagram to the original data. thus the line
32: 3 9 8 7 7
represents the values <323, 329, 328, 327, and 327.
Having organized the information from Table 1 into the stem and leaf
diagram, we could sort the data by just sorting the
units digits, the leaf values in each line.
This would produce
The shape remains the same, thus giving us the simple
histogram shape, but, with the values in order we also see
more details of the distribution of the values. We could even use this
"sorted" stem and leaf diagram to help identify the
median and quartile points. Even the mode becomes more obvious in
this sorted version, the five 2's in the 350's just stand out as being the
most frequent values.
Of course, all of this, while cute,
is still a pain once we have a computer to do our work for us.
In addition, the stem and leaf process, as shown above, just
works for situations where the stem ends at the ten's digit
and the leaf is just the unit's digit.
What would we do for the data in Table 2?
These values range from 2096 to 2549.
If we were to follow the pattern introduced above we would end up with
stem values from 209 to 254, that is, we would have
46 stem values. We can pretty much assume that
some of these would have no leaves, some would have 1 leaf, some would have
2 leaves and there may even be some with 3 or more leaves.
We have room on this web page to display such a diagram,
although we will omit the stems that have no leaves.
This gives us some information, though it is distorted by the missing lines.
Traditionally, in such a situation, we would recognize that we would get a better
feel for the distribution of the
values in Table 2 if we had the
stems end at the hundred's digit and the leaf would be the
ten's digit.
Doing this would mean that we would be losing some information,
namely the unit's digit, but we would end up with fewer stems,
in this case we would have stems 20, 21, 22, 23, 24 and 25.
Rather than just using the data as it is and lopping off the unit's
digit, we would actually round the values of Table 2
to the nearest ten's digit to get the values in Table 3.
From there we can just divide those values by ten to get Table 4.
Now we can use the original process to produce
the following:
Then we could sort that to get
Of course, when reading this we need to remember how we got the values and that
21: 0 2 7 8 9 9
represents 6 values from the range of 2195 to 2295
because we rounded the values to the ten's place and then
divided them by 10 to shift the values so that we could
use our stem and leaf procedure.
text to be added
It should be evident that the stem and leaf procedure
was quite helpful before we had computers but that it
does not serve much more than a historical use now.
R does have a built-in function
stem() that will produce a ste and leaf diagram.
if we run stem(L1) for the current data get the output shown in Figure 2.
Figure 2
You should note that R has taken the liberty of dividing the intervals
in half. That way we get two entries for the stem 21,
the first holding any leaf values from 0 through 4 and the second
holding leaf values from 5 through 9. This makes the branches shorter.
Also, note that the stem() function took it upon itself to
decide where to locate the break between the stem and the leaf.
As noted above, we have little need for these diagrams at this point.
Still, it is nice that R has a built-in function to
do our work if we need to produce such a diagram. We could also
create our own function, perhaps with a bit more flexibility.
For example, the function definition
stem_leaf<-function(
lcl_list, place=0)
{ # convert the list to the form dddsl where the l is
# the leaf and the s becomes the stem
chop_list <- floor(round( lcl_list, -place)/(10^place))
# we can sort this
chop_list <- sort( chop_list )
# then produce the stems and the leaves
leaf_list <- chop_list %% 10
stem_list <-floor( chop_list/10)
n <- length(stem_list)
branch <- paste(stem_list[1]," | ",leaf_list[1],sep="")
old_stem <- stem_list[1]
for (i in 2:n)
{
if( stem_list[i] == old_stem)
{ branch <- paste( branch, leaf_list[i], sep=" ")}
else
{ cat( branch, "\n")
branch <- paste(stem_list[i]," | ",leaf_list[i],sep="")
old_stem <- stem_list[i]}
}
cat( branch, "\n" )
}
really does the exact process that was described above. That
function is available at stem_leaf.R.
For that function you supply not only the list of data values
but also the place where you want the leaf. So, if we
want to have the leaf in the ten's place, we give the
command
stem_leaf(L1,place=1)
and R will
produce the output shown
in Figure 3.
Figure 3
Return to Topics page
©Roger M. Palay
Saline, MI 48176 January, 2016