Linear Regression in Python

Hi everyone!

In the previous post, we have gone through the theory of Linear Regression. If you are getting tired, then cheer up! This post is a hand-on practical guide on how to make a Linear Regression in Python!

By finishing this blog, you will be able to make a real complex-regression-line on real data. You will also know how to effectively cross-validate your models with just several lines of code, plus using customized evaluation functions as you wish.

Test your knowledge

Now, let’s begin!

Before making models, we should be having data first.
Let’s make up some!

# import numpy as usuall.
import numpy as np

# make up some fictional data and corresponding labels.
full_data = np.array([[1, 2, 23],
                     [6, 2, 40], 
                     [4, 1.1, 51.2], 
                     [9, 0, 22.5], 
                     [2, 9, 46], 
                     [4, 1, 41], 
                     [6, 4, 2]])

full_label = np.dot(full_data, [[1], [-2], [3]]).ravel() +\
             np.random.rand(7)

# split data into training and testing set.
#    the first 5 samples are used for training.
#    while the last 2 are for testing.
train_data = full_data[:5]
train_label = full_label[:5]

test_data = full_data[5:]
test_label = full_label[5:]

1. (Ordinary) Linear Regression

From: sklearn.linear_model.LinearRegression

Ordinary Linear Regression (OLS) is the most simple type and is also the one with the highest popularity in the house of Linear Regression.

Fortunately, the wonderful sklearn library does support this model (and along with many other variations as we will see in this blog), making the process very convenient.

# get the LinearRegression out of sklearn.
from sklearn.linear_model import LinearRegression

# making a Linear Regression is very simple.
#    first, we create an instance of the LinearRegression class
reg = LinearRegression()
#    then feed data to it
reg.fit(train_data, train_label)
#    and that's done. We already have our regressor.

# let's print out the coefficients (the weights) and the intercept term.
print('Coef: ', reg.coef_)
print('Intercept: ', reg.intercept_)

Coef:  [ 0.98244296 -2.07111263  3.0069251 ]
Intercept:  0.5436236174555944

The code is easy and the result looks acceptable, isn’t it?
While our true coefficients are [1, -2, 3] and the noise added is roundly zero, the predicted coefficients are quite close and the intercept is indeed near 0. Good job!

And here is how we use our regressor to predict new data:

test_pred = reg.predict(test_data)
print('test_y-value:', test_label)
print('test_pred:', test_pred)

test_y-value: [125.05774981   4.12095121]
test_pred: [125.68621177   4.16768103]

Neat!
Even on testing data, the predictions have very high accuracy as they have very small errors.

We have just walked through how to make a linear regressor. Simple, right?
Indeed, sklearn made complicated things candies for us. Just by calling 2 methods: “fit” to train and “predict” to get the predictions, and that’s all. The same also goes for its other models, remember, just “fit” and “predict” will get you the desired results. Below, we will have a practice with the Lasso model.

2. Lasso

From: sklearn.linear_model.Lasso

Recall that Lasso is just Linear Regression plus L1 regularization. If you are not so familiar with Lasso or regularization, here is a reference.
Even though we are making a Lasso here, the processes for Ridge and ElasticNet are just the same. Just call Ridge or ElasticNet instead of Lasso and you will get another regressor.

Remember that before applying regularization, we should always scale (or normalize) our data to have the features being penalized equally.

# getting Lasso from sklearn library.
from sklearn.linear_model import Lasso

# getting StandardScaler
from sklearn.preprocessing import StandardScaler

# apply the scaler to data
scaler = StandardScaler().fit(train_data)
scaled_train_data = scaler.transform(train_data)

# create an instance of Lasso.
#    note that we have to pass a parameter, "alpha". \
#        This is the regularization weight.
lasso = Lasso(alpha=0.3)
# this feed the training data.
lasso.fit(scaled_train_data, train_label)

Lasso(alpha=0.3, copy_X=True, fit_intercept=True, max_iter=1000,
   normalize=False, positive=False, precompute=False, random_state=None,
   selection='cyclic', tol=0.0001, warm_start=False)

Ok, we have done the with the training phase. Here are the coefficients and intercept term:

print('Coef: ', lasso.coef_)
print('Intercept: ', lasso.intercept_)

Coef:  [ 2.62791584 -6.15789595 34.96860977]
Intercept:  108.89887803199049

And we can see how it performs against the testing data:

# scale test-data
scaled_test_data = scaler.transform(test_data)

# get predictions
lasso_test_pred = lasso.predict(scaled_test_data)
print('test_y-value:', test_label)
print('test_pred:', lasso_test_pred)

test_y-value: [125.05774981   4.12095121]
test_pred: [125.27195121   5.76521485]

3. RidgeCV with customized evaluation function

From: sklearn.linear_model.RidgeCV

Ok, next we will have an example with RidgeCV (a Ridge with built-in cross-validation) together with a customized evaluation function.

# as usual, we take the regression algorithm from sklearn.
from sklearn.linear_model import RidgeCV

# here is the customized evaluation function.
# for this example, I will use Max Absolute Error, \
#    but you can change to anything you like.
# this function should take 3 parameters:
#    an estimator (an instance of a regression model).
#    the predictor variables' values.
#    the corresponding response variable's values.
def max_absolute_error(estimator, X, y):
    y_pred = estimator.predict(X).ravel()
    residual = np.abs(y - y_pred)    
    return -max(residual) # notice that we use negative \
#      max-absolute-error because, by default, RidgeCV perceives \
#      that higher score means better performance.

# We scale the data
scaled_full_data = StandardScaler().fit_transform(full_data)

# Then, let's begin training.
# While Ridge expects a parameter "alpha", RidgeCV \
#    requires "alphas", which is a list of alpha values. \
#    RidgeCV will run on each of these \
#    alpha values separately, and then select the best \
#    alpha (i.e. the alpha that produces maximum score \
#    according to our customized evaluation function).
# We also pass our evaluation function to the parameter "scoring".
# We pass "cv=None" to specify that the cross-validation \
#    method in use is Generalized-Cross-Validation.
ridge_cv = RidgeCV(alphas=[0.1, 1, 5], 
                   scoring=max_absolute_error, 
                   cv=None)

# Ok. Now its time to train the data.
ridge_cv.fit(scaled_full_data, full_label)

RidgeCV(alphas=array([0.1, 1. , 5. ]), cv=None, fit_intercept=True,
    gcv_mode=None, normalize=False,
    scoring=<function max_absolute_error at 0x7f4580f5b320>,
    store_cv_values=False)

Are you curious about which alpha value is chosen? Print it out:

ridge_cv.alpha_

0.1

And also check how the model performs with our full dataset.

max_absolute_error(ridge_cv, scaled_full_data, full_label)

-1.3745863898524089

4. Linear Regression with MAE cost function

From: sklearn.linear_model.SGDRegressor

Welcome to the last section!
You have gone through most of this blog post, there is just 1 more piece.

For the 3 examples we have seen above, the regressors are composed using closed-form computation. If you remember, in this Linear Regression’s theory blog, we said that there is another way to devise a regressor from data, that is to use Gradient Descent.
Here we show how you tell Python to make your regressor using Gradient Descent.

And one more thing, that is: the closed-form method assumes the cost function is Mean Squared Error, while the Gradient Descent one does not. Thus, when using Gradient Descent to train our Linear Regressor, we can make use of some other cost functions. For this example, we will use the Max Absolute Error (MAE).

# taking the Regressor from sklearn.
# SGD stands for Stochastic Gradient Descent.
from sklearn.linear_model import SGDRegressor

# to let it know we want MAE, pass "loss='epsilon_insensitive', epsilon=0".
# we can also apply regularization with "penalty" and "alpha".
# max_iter defines the maximum number of passes through data.
# tol (tolerance) specifies the mininum gain required for each pass.
sgd_reg = SGDRegressor(loss='epsilon_insensitive', \
                       epsilon=0, penalty='l1', alpha=0.1, \
                       max_iter=1000, tol=1e-3)
# then train.
sgd_reg.fit(train_data, train_label)
# and print out the regressor's parameters.
print('Coef: ', sgd_reg.coef_)
print('Intercept: ', sgd_reg.intercept_)

Coef:  [ 0.75424869 -0.10077319  3.08984938]
Intercept:  [0.09298192]

At this stage, we have our regressor in hand. Let’s have it predict the testing data and show the evaluation score.

max_absolute_error(sgd_reg, test_data, test_label)

-6.274128877584106

Test your understanding

Conclusion

By finishing this blog, we now know:

How to make a Linear Regression model (including OLS, Ridge, Lasso, ElasticNet).
How to run Cross-Validation training on these regressors.
How to make a Stochastic-Gradient-Descent-based regressor.
How to set different cost functions for your regressor.
How to create and apply your own evaluation function.

That’s a lot to start with, but you have successfully completed it. Congratulations!

You can find the full series of blogs on Linear regression here.

References:

sklearn library for Linear Models: link

Tung M Phung's Blog

Linear Regression in Python

Leave a ReplyCancel reply