DEV Community

Cover image for What is Regression in Machine Learning?
Nomadev
Nomadev

Posted on

What is Regression in Machine Learning?

Hello everyone, we at Nomadev AI, back with another blog!

(Glad to have you here with us!)

Welcome to Day 4 of our 30-day series on Machine Learning, Deep Learning, and Generative AI.

No jargon, no buzzwords, just practical, beginner-friendly explanations to help you get the hang of AI one step at a time.

Today, we’ll be diving deep into Regression in Machine Learning, a cornerstone technique for predicting continuous values. This blog will not only explain the concept but also guide you through real-world examples, implementation, and applications.

By the end of this blog, you’ll understand how regression works, its types, and how to implement it in Python to solve real-world problems.

Let’s roll up our sleeves and dive right in!

Image description

What is Regression in Machine Learning?

Image description

Regression is a type of Supervised Learning used to predict continuous outcomes by modeling the relationship between input features (X) and output variables (Y). In simpler terms, it helps machines understand how one or more variables affect another.

Why Regression Matters

Here’s why regression is a must-have skill for anyone working with data:

  • Predict Continuous Values: Used in applications like predicting house prices, stock values, or weather forecasts.
  • Quantify Relationships: Helps to understand how one variable impacts another.
  • Versatile and Scalable: Regression models can handle small datasets and scale to larger ones with feature engineering.
  • Real-World Applicability: From business analytics to healthcare, regression is at the core of decision-making processes.

Real-Life Example

Imagine you’re a real estate analyst trying to predict house prices. You collect data on:

  • Square footage of the house (Input Feature)
  • Number of bedrooms (Input Feature)
  • Location (Input Feature)
  • Price of the house (Output Variable)

Using this data, a regression model can help you identify patterns and predict the price of a new house based on its features.

For example:

  • A house with 1,500 square feet, 3 bedrooms, and located in New York might be priced higher than one with similar features in a smaller city. Regression allows you to quantify these relationships and make accurate predictions.

Types of Regression in ML

1️⃣ Linear Regression

Image description

The simplest form of regression where a straight line represents the relationship between the input (independent variable) and output (dependent variable).

How It Works:

The goal is to find the best-fit line that minimizes the error between the predicted values and actual values. The line is represented as:

Y = mX + C

Where:

  • ( Y ): Predicted output (e.g., house price)
  • ( X ): Input feature (e.g., square footage)
  • ( m ): Slope of the line (rate of change of Y with respect to X)
  • ( C ): Intercept (value of Y when X = 0)

📌 Example: Predicting house prices based on square footage.


2️⃣ Polynomial Regression

Image description

This type of regression extends linear regression by adding polynomial terms to capture non-linear relationships.

How It Works:

The equation includes higher-degree terms, like ( X^2, X^3 ), to fit curves in the data. For example:

Y = a₀ + a₁X + a₂X² + ... + aₙXⁿ

📌 Example: Predicting startup revenue growth over multiple years, where revenue may increase exponentially instead of linearly.


3️⃣ Logistic Regression

Image description

Although it has "regression" in its name, Logistic Regression is a classification algorithm. It predicts probabilities for binary outcomes (e.g., Spam/Not Spam).

How It Works:

Logistic Regression uses the Sigmoid function to map predicted values between 0 and 1:

P(Y=1 | X) = 1 / (1 + e^(-(b₀ + b₁X)))

📌 Example: Predicting whether a user will click on an ad (1 = Click, 0 = No Click).


4️⃣ Ridge & Lasso Regression

Image description

These techniques prevent overfitting by adding penalties to the model coefficients.

  • Ridge Regression: Adds an L2 penalty (sum of squared coefficients).
  • Lasso Regression: Adds an L1 penalty (sum of absolute values of coefficients).

📌 Example: Predicting stock prices while ensuring the model doesn’t memorize noise in the data.


How Regression Works (Step-by-Step Guide)

Image description

Step 1: Data Collection

The very first step is data collection. Without good-quality data, no machine learning model can perform well. In this step, you gather labeled data, where each input (or feature) has a corresponding numerical output (or target value).

📌 Example:

Imagine you’re building a model to predict house prices. Your dataset might include:

  • Input features: Square footage, number of bedrooms, location, and year built.
  • Output (target value): The price of the house.

Make sure your dataset is:

  • Comprehensive: Capture all relevant features affecting the outcome.
  • Representative: Reflect the diversity of real-world cases.
  • Free of bias: Avoid skewed data favoring specific conditions.

Step 2: Data Preprocessing

Prepare your data for training by:

  • Handling missing values: Replace missing data with averages, medians, or estimates.
  • Encoding categorical variables: Convert non-numerical values (e.g., city names) into numbers.
  • Scaling features: Standardize numerical ranges (e.g., square footage vs. bedrooms).

📌 Tip: Use Pandas and Scikit-learn for efficient preprocessing.


Step 3: Splitting the Data

Divide data into two parts:

  1. Training Set (80%): Used to train the model.
  2. Test Set (20%): Reserved for evaluating model performance.

Why is this important?

Without splitting, the model might overfit (memorize training data but fail on new data).

📌 Example: For 1,000 rows:

  • 800 rows for training.
  • 200 rows for testing.

Step 4: Model Training

Train your regression algorithm to find the best-fit line/curve by minimizing prediction errors.

📌 Tip: Use Scikit-learn for simple, code-friendly model training.


Step 5: Model Evaluation

Evaluate performance using:

  • Mean Squared Error (MSE): Lower values = better accuracy.
  • R-Squared Value: Closer to 1 = better explanation of data variability.

📌 Example: A model predicting house prices within ±10% error is strong.


Step 6: Make Predictions

Use your trained model to predict outcomes for new, unseen data.

📌 Example:

Input:

  • Square footage: 1,500
  • Bedrooms: 3
  • Location: Suburban

Output:

  • Predicted price: $300,000

Final Note

Each step builds on the previous one. Skipping steps risks inaccurate models. Clean data, proper splitting, and thorough evaluation ensure actionable insights!


Example Code: Linear Regression in Python

Here’s a simple implementation using Scikit-learn:

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import numpy as np

# Sample data
X = np.array([[1], [2], [3], [4], [5]])  # Features (e.g., square footage)
y = np.array([150, 200, 250, 300, 350])  # Target values (e.g., house prices)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
model = LinearRegression()
model.fit(X_train, y_train)

# Test model
predictions = model.predict(X_test)
mse = mean_squared_error(y_test, predictions)
print(f"Mean Squared Error: {mse}")
Enter fullscreen mode Exit fullscreen mode

Real-World Applications

Regression models are everywhere. Here are some popular use cases:

  • Real Estate: Predicting property values based on location, size, and amenities.
  • Finance: Forecasting stock prices and economic trends.
  • Healthcare: Estimating patient recovery times based on medical history.
  • Retail: Predicting sales volumes for inventory management.
  • Energy: Modeling power consumption for better resource allocation.

Thanks for Joining the Journey!

Thanks for exploring Regression with me today! We’ve covered its importance, types, how it works, and its applications. This is just Day 4, and there’s more exciting content ahead.


What’s Coming Up Next? 📅

Later this week, we’ll cover:

  • Day 5: Classification Models → Dive into Logistic Regression and Decision Trees for real-world use cases.
  • Day 6: Introduction to Reinforcement Learning → Understand agents, environments, and rewards—and see it in action.
  • Day 7: Data Preprocessing in ML → Discover how to clean, scale, and encode your data for better results.

Let’s Connect and Build Together 🤝

Make sure to follow me on X (formerly Twitter) and turn on notifications to stay updated on all the latest tutorials.

Together, we’ll make AI accessible and fun for everyone.


Here’s how we can collaborate:

Open to DevRel partnerships to help brands grow through educational content.

Have an AI MVP idea or need consultancy services for AI-based applications and research projects? Let’s make it happen!

Image description

📧 Drop a mail at: thenomadevel@gmail.com


Top comments (0)