linear vs. others (quadratic, logrithmic, exponential, etc.)
strong versus weak
positive versus negative
The Correlation Coefficient
For a set of `n` ordered pairs `(x_i,y_i)", "i" from "1" to "n`,
where `bar x` and `bar y` are the respective means,
and `s_x` and `s_y` ae the respective sample sample deviations,
we have the correlation coefficient `r` defined by
`r = (1/(1 - n))sum_(i=1)^n( ((x_i-bar x)/(s_x))((y_i-bar y)/(s_y)))`
correlation vs. causation
Section 4.2 The Least-Squares Regression Line
`y = mx + b`
`y = ax + b`
`y = a + bx`
`hat y = b_0 + b_1x`; where `b_1 = r(s_y/s_x)` and `b_0 = bar y - b_1 bar x`
Computing the least-squares regression equation
Why the name...
If the solution is `y = mx + b` then we could conside that we are really trying to find values for `m` and `b`.
We have the pair of linear equations
`n*b +S_x*m = S_y `
`S_x*b + S_(x^2)*m = S_(xy)`
where
`n` is the number of data pairs
`S_x` is the sum of the `x` values (not same as `s_x`, standard deviation of `x`'s)
`S_y` is the sum of the `y` values (not same as `s_y`, standard deviation of `y`'s)
`S_(x^2)` is the sum of the squares of the `x`'s
`S_(xy)` is the sum of the products of the `x`'s and `y`'s
and therefore we can solve the system of linear equations, considering
`m` and `b` as variables, using the schemes of algebra.
point of averages `(bar x, bar y )`
least-squares regression equation goes through point of averages.
getting predicted values from a least-square regression equation.