Standard Deviation Formula
Return to Measures of Dispersion
The defining formual for the standard deviation of a collection of
n data values,
is given by
Although this is the defining formula, it is a pain to actually use because you need to
find the mean, μ, first and then go through the data and find the squared
differences between each point and μ, and then get the sum of those values, which you
then divide by the number of values, n, and finally take the square root of the answer.
Fortunately, we can use some math to reconfigure the formual into a more convenient
version. We start this with a rewrite where we expand the squared difference,
much as we learned in Algebra I that
(a – b>²=a² – 2ab + b²)
But that sum can be broken apart to be the separate sums of the three terms
However the term ,
so we could rewrite the formula as
,
Now, with a little slight of hand,
we can make this a bit more complex in order to be able to actually simplify it later.
We can rewrite
as
, but that has the factor
which is our definition of μ.
Therefore,
we can replace with
and our formula becomes
Of course this reduces to
Then, replacing
with
we get
but that simplifies to
Finally, we can rewrite this as
That final formula may look terrible, but, computationally, it is wonderful.
It says that to find the standard deviation
we need to
know the number of values, n, and the sum of the xi values
which we could call sumx and the sum of the xi²
values which we could call sumsqx. Then the formula becomes
This is the formula that small, and even some larger, calculators
use because, as you are entering data values the calculator
takes the value that you enter, adds it to the running total sumx,
adds the square of the value to the running total sumsqx
and increase the count of the data values, n, by one.
Then, whenever you ask for the standard deviation
the calculator just has to do the last formula above.
The system has the extra advantage that if you realize that somewhere in the data entry
process you made a mistake in entering a data value then it is easy to delete that value.
You hit some button to tell the calculator to remove a value and all it has to do
is ask for the value, subtract it from sumx,
subtract its square from sumsqx, and decrease the number of data items, n, by one.
Of course, once you have the calculator, or once you have some computer software
to do the calculation for you, there is little advantage to knowing these
alternative formuli for finding the standard deviation.
Return to Measures of Dispersion
©Roger M. Palay
Saline, MI 48176 October, 2015