Linear regression by hand

… most companies would be taking a huge step forward if they got somebody who knows how to do linear regression.

Hacker News user ‘mindcrime’ on the necessary skills for data science

When you have x_1, \ldots, x_k predictors of a scalar-valued outcome y with observations indexed by i and residuals denoted \varepsilon, a model of the form

y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + \cdots + \beta_k x_{ik} + \varepsilon_i

or (equivalently) in matrix notation

\mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}

is best[1] estimated using ordinary least squares, the workhorse of linear regression. The underlying math is a fair bit of matrix algebra which, when all is said and done, returns

\boldsymbol{\hat{\beta}} = \left(\mathbf{X^\prime X}\right)^{-1} \mathbf{X^\prime y}.

This was the one equation my graduate school program director urged every student to know by heart.

X prime X inverse X prime y, “prime” meaning transpose, yields OLS estimates of linear regression coefficients. On the left-hand side, the hat symbol denotes an estimate from a sample as opposed to a true value in the population.

We can walk through a real-data example using mtcars, an automobile-themed dataset built into R. Regressing fuel economy (mpg) on weight (wt) and number of cylinders (cyl),

\text{mpg}_i = \beta_0 + \beta_1 \text{wt}_i + \beta_2 \text{cyl}_i + \varepsilon_i,


summary(lm(mpg ~ wt + cyl, data=mtcars))

will give you

            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  39.6863     1.7150  23.141  < 2e-16 ***
wt           -3.1910     0.7569  -4.216 0.000222 ***
cyl          -1.5078     0.4147  -3.636 0.001064 ** 

wherein the estimates (first column of numbers) from top to bottom correspond to

\boldsymbol{\hat{\beta}} = \begin{bmatrix} \hat{\beta_0} \\  \hat{\beta_1} \\  \hat{\beta_2} \end{bmatrix}.

Per the foregoing all-important equation, only two objects are necessary to compute the estimates manually: (1) the matrix \mathbf{X} and (2) the vector \mathbf{y}. Both are easy to extract from mtcars.

## getting `X` the easy but obfuscated way
X = model.matrix(mpg ~ wt + cyl, data=mtcars)

## getting `X` a more transparent way:
## grab only the `wt` and `cyl` columns from mtcars
## and prepend a column of ones to represent the intercept
X = cbind(1, mtcars[, c("wt", "cyl")])
X = as.matrix(X) # transform data.frame to matrix type

## getting `y`
y = mtcars[, "mpg"]

Given \mathbf{X} and \mathbf{y}, all that’s left are matrix operations. Mathematically they are described in this tutorial from Harvey Mudd College. Computationally, t() transposes, solve() inverts, and %*% multiplies matrices.

Now for the moment of truth—X prime X inverse X prime y.

solve(t(X) %*% X) %*% t(X) %*% y
1   39.686261
wt  -3.190972
cyl -1.507795

This formula doesn’t get us p-values but who needs those anyway. šŸ˜‰

[1] OLS is BLUE—the best linear unbiased estimator—under certain assumptions that are very important but beyond the scope of this post.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s