# lm() vs lmRob()

15 Sep 2014Let’s explore some robust regression. After having identified good quality predictors for your our purposes, it makes more sense for us to use an estimation technique that is not easily affected by discrepancies in the data and is reliable even if the data consists of outliers.

### Data and Libraries

Let’s get some data to work with. I am going to use the quantmod package . Let’s just use a simple example.

I’ll be using the `dynlm()`

function instead of `lm()`

since it can handle timeseries data directly.

Where there is `lm()`

there is `summary()`

### Bootstrapping - OLS

Let’s bootstrap the OLS estimate and see what the bootstrapped standard error is.

I’ll use the `boot()`

function to do this. Let’s also create a function which can fit a lm model and return the coefficient on the **Industrial Production Index**.

The coefficient on the industrial production index has a rather high standard error but a much smaller bias value. The bias and standard error can be very easily calculated as shown below. Also, we can plot a histogram and a q-q plot of the bootstrapped estimates using the `plot()`

function. Note that **t** here refers to the value of the coefficient on **Industrial Production Index**.

### Outlier

It is well known that OLS estimates are sensitive to small changes in data and or outliers. We can show this emperically. Let’s artificially introduce some outliers into our data. Let’s first identify the data point with the highest cook’s distance. Looks like it’s the **14 ^{th}** datapoint in our GDP dataset. Let’s try to distort this specific datapoint.

The coefficient on the **Industrial Production Index** has changed by **13.3450079** %. We can bootstrap the estimate again to check how the standard error of the bootstrapped estimate changes.

Notice that the plots seem to suggest that the bootstrapped estimates are skewed (unlike in the first case). Let’s go trough the same exercise using `lmRob()`

### lmRob()

Let’s first fit the data and see what the coefficients look like and then we can bootstrap the estimates.

Note the lower R-Squared value.

Now let’s introduce the outlier as we did earlier and see what happens.

Note how the coefficient on Industrial Production Index is the same when compared to the case when there was no outlier. The bootstrapped estimates are fairly normally distributed. But note that the boot strapped estimates have a **higher bias** but a **lower standard error**. In the wake of outliers robust estimation might make more sense. This effect is much better seen in the following plot of the residuals from the two (OLS vs Robust) fits.

Thoughts? Feel free to comment below ! Thanks !