DEV Community

Cover image for Linear Regression
Shlok Kumar
Shlok Kumar

Posted on

Linear Regression

Linear regression is a fundamental statistical method used to predict a continuous dependent variable (the target variable) based on one or more independent variables. This technique assumes a linear relationship between the dependent and independent variables, meaning that changes in the independent variables result in proportional changes in the dependent variable.

In this article, we'll explore the types of linear regression and demonstrate how to implement it in Python.

Types of Linear Regression

There are three main types of linear regression:

  1. Simple Linear Regression: This involves predicting a dependent variable using a single independent variable.
  2. Multiple Linear Regression: This involves predicting a dependent variable based on multiple independent variables.
  3. Polynomial Linear Regression: This involves predicting a dependent variable using a polynomial relationship between independent and dependent variables.

1. Simple Linear Regression

Simple linear regression predicts a response using a single feature. It assumes a linear relationship between the dependent variable and the independent variable. The equation of the regression line can be represented as:

h(x_i) = β_0 + β_1 * x_i
Enter fullscreen mode Exit fullscreen mode

Here:

  • (h(x_i)) is the predicted response for the ith observation.
  • (β_0) is the y-intercept.
  • (β_1) is the slope of the regression line.

To estimate (β_0) and (β_1), we aim to minimize the total residual error, represented by the cost function (J):

J(β_0, β_1) = (1/2n) * Σ(ε_i²)
Enter fullscreen mode Exit fullscreen mode

Where (ε_i) is the residual error for the ith observation.

Python Implementation of Simple Linear Regression

To implement simple linear regression in Python, we will use libraries like numpy and matplotlib. Here’s how to do it:

import numpy as np
import matplotlib.pyplot as plt

def estimate_coef(x, y):
    n = np.size(x)
    m_x = np.mean(x)
    m_y = np.mean(y)
    SS_xy = np.sum(y * x) - n * m_y * m_x
    SS_xx = np.sum(x * x) - n * m_x * m_x
    b_1 = SS_xy / SS_xx
    b_0 = m_y - b_1 * m_x
    return (b_0, b_1)

def plot_regression_line(x, y, b):
    plt.scatter(x, y, color="m", marker="o", s=30)
    y_pred = b[0] + b[1] * x
    plt.plot(x, y_pred, color="g")
    plt.xlabel('x')
    plt.ylabel('y')
    plt.show()

def main():
    x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
    y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])
    b = estimate_coef(x, y)
    print("Estimated coefficients:\nb_0 = {}\nb_1 = {}".format(b[0], b[1]))
    plot_regression_line(x, y, b)

main()
Enter fullscreen mode Exit fullscreen mode

2. Multiple Linear Regression

Multiple linear regression extends simple linear regression by using multiple features to predict a response variable. The equation for multiple linear regression is:

h(x_i) = β_0 + β_1 * x_i1 + β_2 * x_i2 + ... + β_p * x_ip
Enter fullscreen mode Exit fullscreen mode

Where (p) represents the number of features. The coefficients (β_0, β_1, ..., β_p) are estimated using the least squares method.

Python Implementation of Multiple Linear Regression

For multiple linear regression, we can use the Boston housing dataset as an example:

from sklearn.model_selection import train_test_split
from sklearn import datasets, linear_model
import pandas as pd

# Load the Boston Housing dataset
data_url = "http://lib.stat.cmu.edu/datasets/boston"
raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)

# Preprocessing data
X = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
y = raw_df.values[1::2, 2]

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=1)

# Create and train the linear regression model
reg = linear_model.LinearRegression()
reg.fit(X_train, y_train)

# Print regression coefficients
print('Coefficients: ', reg.coef_)
print('Variance score: {}'.format(reg.score(X_test, y_test)))
Enter fullscreen mode Exit fullscreen mode

3. Polynomial Linear Regression

Polynomial regression fits a nonlinear relationship between the independent variable (x) and the dependent variable (y) by using polynomial terms. This method can effectively model relationships that are not linear.

Python Implementation of Polynomial Linear Regression

Here's how to implement polynomial regression using Python:

from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt

# Load dataset
df = pd.read_csv('Position_Salaries.csv')
X = df.iloc[:, 1:2].values
y = df.iloc[:, 2].values

# Create polynomial features
poly_reg = PolynomialFeatures(degree=2)
X_poly = poly_reg.fit_transform(X)

# Fit the polynomial regression model
lin_reg_2 = LinearRegression()
lin_reg_2.fit(X_poly, y)

# Visualize the results
plt.scatter(X, y, color='red')
plt.plot(X, lin_reg_2.predict(X_poly), color='green')
plt.title('Polynomial Regression')
plt.xlabel('Position Level')
plt.ylabel('Salary')
plt.show()
Enter fullscreen mode Exit fullscreen mode

Frequently Asked Questions (FAQs)

  1. How to use linear regression to make predictions?
    Once a linear regression model is trained, it can be used to make predictions for new data points using the predict() method.

  2. What is linear regression?
    Linear regression is a supervised machine learning algorithm used to predict a continuous numerical output based on linear relationships between independent and dependent variables.

  3. How to perform linear regression in Python?
    Libraries like scikit-learn provide simple implementations for linear regression. You can fit a model using the LinearRegression class.

  4. What are some applications of linear regression?
    Applications include predicting house prices, stock prices, diagnosing medical conditions, and assessing customer churn.

  5. How is linear regression implemented in scikit-learn?
    The LinearRegression class in scikit-learn allows for fitting a linear regression model to training data and predicting target values for new data.

By understanding and implementing linear regression, you can effectively model and analyze relationships within your data, driving insights and predictions in various domains.

For more content, follow me at —  https://linktr.ee/shlokkumar2303

Top comments (0)