Module 9: Lecture Notes for Math 170

Return to Lecture Notes page.
Some images on this page have been generated via AsciiMathML.js.
For more information see: www.chapman.edu/~jipsen/asciimath.html.

This topic is included in the course because the occurrence of statistics within programming applications is strangely high. At the same time, it is not common to find built-in functions that produce these values.
Finding the mode involves finding the item, or items, in a list that occur the most often.
Finding the median, and indeed, the first and third quartile points, and percentiles, requires that we sort the list of data. [Three notes on this. First, the strange definition for the median of an even number of items requires a bit of extra programming. Second, there is not a universally accepted rule for determining the first and third quartile points. Third, there is not a universally accepted rule for determining percentiles. ]
Finding the mean involves adding up all the values in the list.
The standard deviation (which is the square root of the variance) gives a feeling for the spread of the data away from the mean.
1. Chebyshev's Inequality: at least `(1-1/(K^2))` of the values in a data list will be within `K` standard deviations from the mean.
2. Normal Distribution: 68% within 1 standard deviation of the mean, 95% within 2, 99% within 3.
3. Definition and full name: root mean squared deviation from the mean.
  for a populations this is: `sigma=sqrt((Sigma_(i=1)^N (x_i-mu)^2)/N)`
  for a sample this is `s_x=sqrt((Sigma_(i=1)^n (x_i-barx)^2)/(n-1))`
4. For each there is a more computationally friendly equivalent formula
  for a population: `sigma=sqrt((Sigma(x^2) - (Sigmax)^2/N)/N)`
  for a sample: `s_x=sqrt((Sigma(x^2) - (Sigmax)^2/n)/(n-1))`
A linear regression is the equation, generally in the form `y=ax+b`, for a straight line that "best fits" the data made up of coordinate pairs of values `(x_i,y_i)`.
1. Correlation Coefficient: a computed value that gives a measure of the "goodness of fit" between the linear regression and the data. Values for the correlation coefficient are always between `1` and `-1`. Values close to either `1` or `-1` indicate good fits, values closer to `0` indicate less of a fit.
2. Observed values
3. Expected values (those predicted by the regression equation)
4. Residuals: (Observed–Expected) values
5. Interpolation
6. Extrapolation

Return to Lecture Notes page.