Linear regression is a fundamental statistical method used to predict a continuous dependent variable (the target variable) based on one or more independent variables. This technique assumes a linear relationship between the dependent and independent variables, meaning that changes in the independent variables result in proportional changes in the dependent variable.
In this article, we'll explore the types of linear regression and demonstrate how to implement it in Python.
Types of Linear Regression
There are three main types of linear regression:
- Simple Linear Regression: This involves predicting a dependent variable using a single independent variable.
- Multiple Linear Regression: This involves predicting a dependent variable based on multiple independent variables.
- Polynomial Linear Regression: This involves predicting a dependent variable using a polynomial relationship between independent and dependent variables.
1. Simple Linear Regression
Simple linear regression predicts a response using a single feature. It assumes a linear relationship between the dependent variable and the independent variable. The equation of the regression line can be represented as:
h(x_i) = β_0 + β_1 * x_i
Here:
- (h(x_i)) is the predicted response for the ith observation.
- (β_0) is the y-intercept.
- (β_1) is the slope of the regression line.
To estimate (β_0) and (β_1), we aim to minimize the total residual error, represented by the cost function (J):
J(β_0, β_1) = (1/2n) * Σ(ε_i²)
Where (ε_i) is the residual error for the ith observation.
Python Implementation of Simple Linear Regression
To implement simple linear regression in Python, we will use libraries like numpy
and matplotlib
. Here’s how to do it:
import numpy as np
import matplotlib.pyplot as plt
def estimate_coef(x, y):
n = np.size(x)
m_x = np.mean(x)
m_y = np.mean(y)
SS_xy = np.sum(y * x) - n * m_y * m_x
SS_xx = np.sum(x * x) - n * m_x * m_x
b_1 = SS_xy / SS_xx
b_0 = m_y - b_1 * m_x
return (b_0, b_1)
def plot_regression_line(x, y, b):
plt.scatter(x, y, color="m", marker="o", s=30)
y_pred = b[0] + b[1] * x
plt.plot(x, y_pred, color="g")
plt.xlabel('x')
plt.ylabel('y')
plt.show()
def main():
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])
b = estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {}\nb_1 = {}".format(b[0], b[1]))
plot_regression_line(x, y, b)
main()
2. Multiple Linear Regression
Multiple linear regression extends simple linear regression by using multiple features to predict a response variable. The equation for multiple linear regression is:
h(x_i) = β_0 + β_1 * x_i1 + β_2 * x_i2 + ... + β_p * x_ip
Where (p) represents the number of features. The coefficients (β_0, β_1, ..., β_p) are estimated using the least squares method.
Python Implementation of Multiple Linear Regression
For multiple linear regression, we can use the Boston housing dataset as an example:
from sklearn.model_selection import train_test_split
from sklearn import datasets, linear_model
import pandas as pd
# Load the Boston Housing dataset
data_url = "http://lib.stat.cmu.edu/datasets/boston"
raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
# Preprocessing data
X = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
y = raw_df.values[1::2, 2]
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=1)
# Create and train the linear regression model
reg = linear_model.LinearRegression()
reg.fit(X_train, y_train)
# Print regression coefficients
print('Coefficients: ', reg.coef_)
print('Variance score: {}'.format(reg.score(X_test, y_test)))
3. Polynomial Linear Regression
Polynomial regression fits a nonlinear relationship between the independent variable (x) and the dependent variable (y) by using polynomial terms. This method can effectively model relationships that are not linear.
Python Implementation of Polynomial Linear Regression
Here's how to implement polynomial regression using Python:
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
# Load dataset
df = pd.read_csv('Position_Salaries.csv')
X = df.iloc[:, 1:2].values
y = df.iloc[:, 2].values
# Create polynomial features
poly_reg = PolynomialFeatures(degree=2)
X_poly = poly_reg.fit_transform(X)
# Fit the polynomial regression model
lin_reg_2 = LinearRegression()
lin_reg_2.fit(X_poly, y)
# Visualize the results
plt.scatter(X, y, color='red')
plt.plot(X, lin_reg_2.predict(X_poly), color='green')
plt.title('Polynomial Regression')
plt.xlabel('Position Level')
plt.ylabel('Salary')
plt.show()
Frequently Asked Questions (FAQs)
How to use linear regression to make predictions?
Once a linear regression model is trained, it can be used to make predictions for new data points using thepredict()
method.What is linear regression?
Linear regression is a supervised machine learning algorithm used to predict a continuous numerical output based on linear relationships between independent and dependent variables.How to perform linear regression in Python?
Libraries likescikit-learn
provide simple implementations for linear regression. You can fit a model using theLinearRegression
class.What are some applications of linear regression?
Applications include predicting house prices, stock prices, diagnosing medical conditions, and assessing customer churn.How is linear regression implemented in scikit-learn?
TheLinearRegression
class in scikit-learn allows for fitting a linear regression model to training data and predicting target values for new data.
By understanding and implementing linear regression, you can effectively model and analyze relationships within your data, driving insights and predictions in various domains.
For more content, follow me at — https://linktr.ee/shlokkumar2303
Top comments (0)