Introduction to Linear & Polynomial Regression

Problem Definition

Let’s say you run a business and you’re spending a varying amount of money on advertisements every single month.
Naturally, your revenue is not uniform across all months and you’re trying to better understand and hopefully predict how do ads spendings predict revenue.

I spent X on ads this month so my revenue should be Y with error E

You have tracked the ads spending and revenue across your business lifetime, and here’s what the data shows

You can see that in months you spent 20$ on ads, your revenue varied from ~130 to 300$, and when you spend 60$ your revenue was ~300 to 470$.

Using this (simulated) data, we are going to use two techniques — Linear Regression and Polynomial Regression to try and predict what would be the revenue for any given ad spending amount.

Linear Regression

Linear regression is a model that assumes a linear relationship between the input variable (e.g. ads spending) and the output variable (e.g. revenue).

The model can be represented in the following way


  • y is the prediction value
  • α₀ is the bias term
  • α₁, α₂, …, αₙ are the model parameters
  • x₁, x₂, …, xₙ are the input values

In our case, we only have one input — ads spending so our model will be represented with

Basically, we are trying to come up with a linear equation that describes best the relationship between our input and output.

The above statement leads us to the next question

How do we know what linear equation describes best our data?

To determine that, I am going to introduce a new terminology — Residuals.

Given a linear equation and data points, a residual of a point is the distance between the point and the line.
The sum of residuals would give us an estimation of how good or bad the linear equation is.

Above you can see an example of a ‘bad’ linear equation — we can see that most of our data points are pretty far from the line which means the residual of each point is relatively large.

In comparison, here’s a better linear equation (for our data)


Our goal is to minimize the sum of residuals — the error.
There are a few ways to do that, least-squares and gradient descent are both good options. (In some cases, gradient descent can be cheaper in terms of computational complexity).

First, let’s better define what we are minimizing, by defining a cost function.

MSE = Mean Squared Error

Where yᵢ are the actual data points and ŷᵢ are the predicted values.

I have chosen not to elaborate on gradient descent in this story and I will be using it as a “black box”.

For those of you unfamiliar with the algorithm, the details aren’t too important — all you need to understand is that by applying gradient descent on our cost function (SSE), we are minimizing it and updating our model parameters to correspond with this minimal value which means that by the end of the algorithm we will have the optimal linear equation.

Another key thing about this algorithm is that it’s iterative. At every iteration, we are trying to further minimize our function until the algorithm converges, which means no more minimization is possible.

Applying Linear Regression

Thanks to amazing modules such as sklearn and others, we are able to do this in just a few lines of code.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

x, y  = load_data() #psuedo code for loading your data set

model  = LinearRegression(), y)
y_pred = model.predict(x)

And that’s it, we made a prediction of y values and know we can see how good it is with a few more lines

plt.scatter(x, y, s=10)
plt.plot(x, y_pred, color = 'r')

To get an understanding of how good this line is, we can compute its MSE

mse = mean_squared_error(y, y_pred)

And in our case, MSE= 0.7343.

From the graph, we can see that the trend is not linear and our limited linear model is not giving a good prediction because of that — there are dots especially on the edges that seem pretty far away from the line.

A linear line can’t “capture” well enough a non-linear trend.

At this point, it is time to introduce Polynomial Regression.

Polynomial Regression

As you might have guessed, this time we are going to look for the best polynomial that describes our data, and not a linear equation.

The polynomial regression model can be represented in the following way

Since we already have some intuition on how it works and we know what we are trying to minimize here (same as linear regression — MSE), let’s go straight to applying polynomial regression on our data and see the differences.

Applying Polynomial Regression

Similar to what we did with linear regression — we can apply a polynomial regression in the following way

import numpy as np
import operator
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegressionfrom sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import mean_squared_error


x, y  = load_data() #psuedo code for loading your data set

poly_reg = PolynomialFeatures(degree=POLY_DEGREE)
x_poly = poly_reg.fit_transform(x)

model = LinearRegression(), y)

y_pred = model.predict(x_poly)

Now in order to check how good this prediction is let’s generate the graph as we did before (with a little necessary twist)

plt.scatter(x, y, s=10)

# Sorting x and y values according to values in x
sort_axis = operator.itemgetter(0)
sorted_zip = sorted(zip(x,y_pred), key=sort_axis)
x, y_pred = zip(*sorted_zip)

plt.plot(x, y_pred, color='r')

Just by looking at the graph, we can see that this line is a better predictor of our data. Now let’s calculate the mean squared error.

Pay attention to at what point you are calculating the error, as you might get some nonsense scores.
Right after calling on predict calculate MSE with 

mse = mean_squared_error(y, y_pred)

And this time the result is 0.3467

If you paid attention, you saw me passing the argument POLY_DEGREE which I set to 2, and we can tweak it to try and get better results.

A degree of 2 means that the model is looking for the parameters for the following equation

And by increasing the degree we are basically giving the model more “power” so it should produce better predictions.


Hopefully, we got a better understanding of what is regression, what it aims to do, and how we apply linear and polynomial regression to our data.

4 responses to “Introduction to Linear & Polynomial Regression”

  1. Hi there! I know this is kinda off topic however I’d figured I’d ask. Would you be interested in trading links or maybe guest writing a blog post or vice-versa? My site addresses a lot of the same topics as yours and I think we could greatly benefit from each other. If you might be interested feel free to shoot me an e-mail. I look forward to hearing from you! Awesome blog by the way!


  2. Have you ever considered publishing an e-book or guest authoring on other blogs? I have a blog based on the same subjects you discuss and would really like to have you share some stories/information. I know my readers would appreciate your work. If you’re even remotely interested, feel free to shoot me an email.


  3. I blog quite often and I genuinely thank you for your content. This great article has really peaked my interest. I am going to take a note of your website and keep checking for new information about once per week. I opted in for your RSS feed as well.

    Liked by 1 person

Leave a Reply to Lorriane Krasnansky Cancel reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: