GETTING STARTED

Linear Regression using Scikit-learn and Tensorflow.

7 min readJan 4, 2022

Comparing linear regression models made using scikit-learn and TensorFlow.

There is a lot of content about Linear Regression on the internet. Yet no one compared the linear regression models made using scikit-learn and TensorFlow. So, today I’m sharing a comprehensive guide for the same.

TABLE OF CONTENT:

What is Linear Regression?
Dataset.
Implementing Linear Regression using Scikit-learn.
Basic explanation of Neural Network.
Implementing Linear Regression using TensorFlow.
Observations
Conclusion
Resources

1. What is Linear Regression?

Linear Regression is a supervised machine learning algorithm that is used to model a linear relationship between the dependent and independent variables. In other words, it best fits the linear line between the independent and dependent variables.

TYPES OF LINEAR REGRESSION:

Simple Linear Regression: It’s the case where we have only one independent variable.

Multiple Linear Regression: It’s the case where we have more than one independent variable.

AIM OF THE MODEL:

A Linear Regression model main aims to find the best fit linear line and minimize the error by finding the optimal values of intercept and coefficient. Error is the difference between the actual value and the predicted value.

x is the independent variable.
y is the dependent variable.
The blue line is the regression line.
Black dots are the actual data points and the corresponding perpendiculars on the regression line are the predicted values.
b0 is intercept and b1 is co-eff.

ERRORS:

Error is the difference between the actual value and the predicted value. The linear regression model mainly uses two types of error estimators:

Mean Absolute Error(MAE): It is the mean of all the differences between the actual and the predicted value.
Mean Squared Error(MSE): It measures the average of the squares of the errors.

2. Dataset

This article will create a dataset that follows the simple equations of a straight line.g., y = 10x + c. Therefore, the models should predict the exact values of coefficients and intercept.

# Creating a random array of feature vector 
import numpy as np
X = np.random.randint(10,size=10)# Reshape the X vector to a matrix
X = np.expand_dims(X,axis=-1)# Creating the label for the dataset
y = X + 10# Explore the nature of these arrays X and y
print(f"Array X: {X}\nShape: {X.shape}\nDtype: {X.dtype}")
print(f"\nArray y: {y}\nShape: {y.shape}\nDtype: {y.dtype}")

Visualizing the dataset

# Plot the dataset using matplotlib
import matplotlib.pyplot as plt
plt.scatter(X,y)

That is that, now let’s investigate both the models and see what we get.

3. Implementing Linear Regression using Scikit-learn.

Now we’ll implement linear regression using scikit-learn. Scikit-learn is one of the largest python libraries used by engineers for machine learning purposes. It makes the entire process so simple; all we have to do is import the desired model and train on the data.

Training:

Training is nothing but finding the optimal values of coefficients and intercept. Scikit-learn’s Linear Regression uses a direct “closed-form” equation( SVD ) that directly computes the best values for intercepts and coefficients (also known as bias and weights, respectively). The time complexity for the SVD approach is O(n²), where n is the number of features. And it has a linear time complexity O(m) with respect to the number of instances in the training set.

Importing and using the model:

# Import the model from sklearn
from sklearn.linear_model import LinearRegression# Instantiate the model
model = LinearRegression()# Fit the model
model.fit(X,y)# Viewing the optimal value of intercept and co-eff calculated by the model
print(f"Co-eff: {model.coef_}\nIntercept: {model.intercept_}")

Predicting a value and calculating the MAE AND MSE:

# Checking the outcome for a single 1X1 input matrix.
model.predict([[10]])# Importing MSE AND MAE from sklearn
from sklearn.metrics import mean_squared_error, mean_absolute_error# Predicting the results y_pred on the original input matrix X
y_pred = model.predict(X)# Calculating the MAE and MSE
mae = mean_absolute_error(y, y_pred)
mse = mean_squared_error(y,y_pred)
print(f"MAE: {mae}\nMSE: {mse}")

Note that we get both MAE and MSE on the original dataset on running the above code as 0. This means that the linear regression model was able to find the perfect relation between independent variables X and the dependent variables y, i.e., y = X + 10. Now we’ll try to achieve the same using neural networks in TensorFlow and see if that is possible or not.

Insight:

Now that we know about the time complexities of the SVD approach, one question that should occur to us is, “whether we should use scikit-learn’s linear regression on a particular dataset or not”? So the answer to this question has two points:

The algorithm works perfectly fine when the number of features is in the range of 1–1,000.
But the algorithm becomes very slow if the number of features grows very large. Let’s say we’ve 10,00,000 features; then, according to the time complexity O(N²), it would require ¹⁰¹² computations. Assuming that a typical computer can perform ¹⁰⁸ calculations in 1 sec, it will take 10,000 secs to complete this ¹⁰¹² operation. In some cases, the computer also runs out of memory. Therefore, you might want to shift your focus towards deep neural networks.

4. Basic explanation of Neural Network

Instead of using a complete perception layer or any complex neural network, we will be using a single perception to establish the relationship between dependent and independent variables. Now let’s understand the concept of a perceptron.

A single perception, also known as a neuron.

Input features: x1, x2, x3
Weights: w1, w2, w3
Output: y
Bias: Value is hidden in the circle (β).

The output y is produced by summing the weighted inputs (product of each input from the previous layer multiplied by their weight) and adding a bias β. Usually, we apply some activation function(such as sigmoid, ReLU) on the output value y to get a non-linear regression line. But in this case, we’ll not be using any activation function because we want a linear regression line.

If we observe carefully, the equation for the perceptron with a single input feature x1 with a weight w1 and a bias β will turn out to be: y = β + w1.x1 This equation is the same as the equation for linear regression. In other words, it is the equation for linear regression with one variable using a perceptron.

5. Implementing Linear Regression using TensorFlow.

Now we’ll look at coding the linear regression in TensorFlow 2.7. TensorFlow is one of the most extensive machine learning frameworks used for deep learning practices. All we have to do is define the architecture of the neural networks, and the rest is taken care of by TensorFlow itself. The benefit of using TensorFlow is that we can convert the data into tensors and then use GPU for training the model on our data. Remember that our ultimate goal is the comparison of linear regression models using scikit-learn and TensorFlow. Therefore we’ll train the model made using TensorFlow on the data(x,y) defined in the 2nd section of this story.

Training:

Neural networks use an iterative optimization approach called the Stochastic Gradient Descent(SGD) that gradually tweaks the model parameters to find the optimal values of coefficient and intercept. Since it is an iterative method, it is sometimes possible that we may never converge to the exact optimal value.

Defining and using the model:

# Importing the libraries
import tensorflow as tf
from tensorflow import keras# Set the random seed
tf.random.set_seed(42)# 1. Create the model
model = tf.keras.Sequential([
tf.keras.layers.Input(shape=(1,1)), # Defing the input shape
tf.keras.layers.Dense(1, name='outputLayer') # Defing the output shape
], name='linearRegressor')# 2. Compile the model
model.compile(loss=tf.keras.losses.MeanSquaredError(),
optimizer=tf.keras.optimizers.SGD(learning_rate=0.01, momentum=0.3),
metrics=['mae'])# 3. Visualize the model
model.summary()# Convert the data to tensors
X = tf.constant(X)
y = tf.constant(y)# 4. Fit the model
History = model.fit(X,y, epochs=1000, validation_split=0.33, batch_size=10, verbose=0, shuffle=True)# 5. Viewing the value of bias and weights calculated by the model
print(f"Weights: {model.layers[0].weights}\n Bias: {model.layers[0].bias.numpy()}")

Predicting a value and calculating the MAE AND MSE:

# Checking the outcome for a single 1X1 input tensor(matrix).
model.predict(tf.constant([[10]]))# Predicting the results y_pred on the original input matrix X
y_preds = model.predict(X)# Changing the y and y_preds to vectors 
y = y.numpy().squeeze()
y_preds = y_preds.squeeze()# Calculating the MAE and MSE
mae = tf.metrics.mean_absolute_error(y, y_preds)
mse = tf.metrics.mean_squared_error(y, y_preds)
print(f"MAE: {mae}\nMSE: {mse}")

The MAE and MSE don’t come out to be 0 because our model could not find the exact optimal values of coefficient and intercept.

6. Observations

Training Methodology: To find the optimal values of coefficient and intercepts, scikit-learn uses a “closed form” equation (SVD), whereas TensorFlow uses an iterative approach (SGD).
Training Time: The training time of the tensorflow model was way more than the training time of the scikit-learn model. (about 1000x)
Evaluation Metrics: Scikit-learn model achieved exact optimal values for the linear regression problem resulting in 0 error, but that wasn’t the case with the TensorFlow model.
Space complexity: Using scikit-learn for a dataset with a huge number of features may cause the computer to run out of memory.

7. Conclusion

To conclude, I would like to suggest that If you’re not training on a dataset that has a massive number of features(e.g., 100,000) or that requires complex non-linear regression lines, then stick to scikit learn. Otherwise, you always have the neural networks to save you.