DEV Community

Dipsan Kadariya
Dipsan Kadariya

Posted on

Machine Learning

Supervised Learning

Supervised learning involves learning through data with input and output, finding the relationship between them, and using that relationship to predict outputs for new inputs.

In this case, we have both input and output. In supervised learning, we train the machine to learn from the given data and predict whether, for certain IQ and CGPA values, there will be a placement or not.

Types of Supervised Learning:

  • Regression
  • Classification

Data Types

Data is generally of two types:

  • Numerical: Age, weight, CGPA, IQ
  • Categorical: Gender, nationality, brand, etc.

Regression (Supervised Learning)

  • If the output is numerical, it is called regression.
  • Example: For a certain IQ and CGPA, let's say we predict a salary package of 50,000. Since the output column (package) is numerical, this is regression.

Classification (Supervised Learning)

  • If the output is categorical, it is called classification.
  • Example: Given certain CGPA and IQ values, we predict whether a student will get placed or not. Since the output is categorical (Yes/No), this is classification.

Unsupervised Learning

Unsupervised learning is used when we have only input data and no corresponding output.

Example:

In this example, we only have inputs and no output. We don't know what to predict.

In unsupervised learning, we perform one of the following:

  1. Clustering
  2. Dimensionality Reduction
  3. Anomaly Detection
  4. Association Rule Learning

Clustering

  • Let's say we plot IQ and CGPA.
  • A clustering algorithm detects which students belong to the same group.
  • Example: We can categorize students into different groups such as:
    • High IQ, low CGPA
    • Low IQ, low CGPA
    • High IQ, high CGPA

Dimensionality Reduction

  • If there are a lot of input columns (e.g., 1000), the algorithm runs slow, and at some point, adding more columns does not improve results.
  • Dimensionality Reduction (DR) removes unnecessary columns to improve efficiency.
  • Example: If we need to predict house price based on the number of rooms and washrooms, DR combines these into a single feature, reducing the number of input columns.
  • DR also helps in visualizing high-dimensional data by reducing it to 2D or 3D.

Anomaly Detection

  • Used for finding errors and detecting outliers in data.
  • Helps in fraud detection, network security, etc.

Association Rule Learning

  • A technique in unsupervised learning used to find relationships or patterns between variables in large datasets.
  • Example: Market Basket Analysis (identifying which products are frequently bought together).

Semi-Supervised Learning

  • Semi-supervised learning lies between supervised and unsupervised learning.
  • It involves a small amount of labeled data and a large amount of unlabeled data.
  • The labeled data helps guide the learning process of the unlabeled data.

Reinforcement Learning

  • Reinforcement learning is where an agent learns how to make decisions by interacting with an environment.
  • The agent performs actions and receives feedback in the form of rewards or penalties.
  • The goal is to maximize cumulative rewards over time.

Categories Based on Production

Batch Machine Learning

  • Code runs on a server (offline learning).
  • The model is trained once using the "entire dataset" and then deployed.
  • convential way of training model.
  • Process: data → model → train → test → server → run

Problems with Batch Learning:

  • The model is static and does not evolve with new data,since it is trained offline.
  • Requires periodic retraining with merged new and old data.

Online Machine Learning

  • Done incrementally (model learns continuously).
  • Data is fed in small batches (mini-batches) sequentially.
  • Model is trained online.
  • The model improves with interactions and new data.

Process: small data → model → train → test → server

<--- Continuous new data

Predictions on New Data

  • The model continues learning from new data.
  • Examples: Chatbots like GPT, YouTube video recommendations.

When to Use Online Learning?

  • Concept Drift: When concepts change over time (e.g., e-commerce trends).
  • Cost-Effective: Continuous learning without expensive retraining.
  • Faster Solution: Adaptability in real-time applications.

Out-of-Core Learning

  • Used when the dataset is too large to fit in memory.
  • The dataset is split into batches and processed in chunks.
  • Though performed offline, it follows an online learning approach.

Disadvantage:

  • More complex and can be risky if not implemented correctly.

Based on Learning

Machine learning can be categorized based on how it learns:

  • By Memorizing (Instance-Based Learning)
  • By Generalizing (Understanding concepts, Model-Based Learning)

1. Instance-Based Learning

Instance-based learning does not actually learn patterns but stores training data and responds based on the nearest neighbors.

Example:

IQ CGPA Placement
8 8 Yes
7.0 7.3 No
  • In instance-based learning, the model does not learn anything. It just stores the data.
  • When a new query comes, it looks at the nearest data points to decide.
  • If the nearby points mostly have placements, the answer is "Yes"; otherwise, it's "No".
  • There is no training or learning—just pattern matching.

Key Point:

  • It focuses on simple pattern matching, where the model stores examples and instantly answers based on the nearest matching data point.

2. Model-Based Learning

  • The model learns from data using algorithms.
  • It understands the pattern and draws a boundary.
  • The boundary helps predict answers for new inputs.
  • Unlike instance-based learning, model-based learning finds a mathematical relation between input and output.
  • Even if we don’t have training data points, we can predict using the learned boundary.

Key Point:

  • This refers to building an internal model of the relationships in the data, which can then be used to make predictions.

Differences Between Instance-Based and Model-Based Learning

Usual/Conventional Machine Learning Instance-Based Learning
Prepare data for model training. No model training.
Train models to generalize patterns. No training—only stores data.
Can make predictions using learned models. Predictions based on stored examples.
Results in a generalizable model. No generalization—just stores past data.
Missing attributes are handled better. Every new input needs complete data.

Challenges in Machine Learning

  1. Data collection issues
  2. Insufficient labeled data
  3. Non-representative data
  4. Poor quality data
  5. Irrelevant features
  6. Overfitting
  7. Underfitting
  8. Software integration issues
  9. High costs involved

Machine Learning Development Lifecycle (MLDLC)

  1. Problem Definition

    • Identify the problem, customer needs, and expected outcomes.
  2. Data Gathering

    • Collect relevant data from sources such as APIs, sensors, or databases.
  3. Data Preprocessing

    • Remove duplicates, fill missing values, and convert data into a usable format.
  4. Exploratory Data Analysis (EDA)

    • Understand data distribution, relationships, and visualizations.
  5. Feature Engineering & Selection

    • Create new features and remove unnecessary ones.
  6. Model Training, Evaluation, and Selection

    • Train models and tune hyperparameters for better performance.
  7. Model Deployment

    • Convert trained models into software for real-world use.
  8. Testing

    • Perform beta testing, optimize performance, and retrain if needed.

Tensor in Machine Learning

  • Tensors are data structures used to store numerical data.
  • They can be 0D, 1D, 2D, 3D, or ND tensors.

Types of Tensors

  • 0D Tensor (Scalar): A single number.
  • 1D Tensor (Vector): A single row of numbers.
  • 2D Tensor (Matrix): A table-like structure with rows and columns.
  • 3D Tensor: A collection of 2D matrices.
  • ND Tensor: Higher-dimensional tensor for complex data processing.

Top comments (0)