Forem

Harsh Mishra
Harsh Mishra

Posted on

Guide to MongoDB Aggregation Framework

Comprehensive Guide to MongoDB Aggregation Framework

MongoDB’s Aggregation Framework is a powerful tool that allows you to process and transform data stored in MongoDB documents. It enables complex data manipulations such as filtering, grouping, sorting, and joining data in a flexible and efficient way. The aggregation framework is essential for tasks like creating reports, performing data analysis, and transforming data in MongoDB.

This article provides a complete and detailed guide to MongoDB Aggregation Framework, explaining the key stages, operators, syntax, and real-world examples. We will walk through each aggregation stage, how it works, and what the output will look like for every example.


1. What is Aggregation in MongoDB?

Aggregation in MongoDB refers to the process of transforming and combining data from multiple documents into a meaningful result. It is similar to the GROUP BY operation in SQL but far more flexible and powerful.

MongoDB offers a variety of methods for performing aggregation, but the aggregation pipeline is the most common and powerful method. The aggregation pipeline processes data by passing it through multiple stages, each performing specific operations on the data.


2. The Aggregation Pipeline

The aggregation pipeline consists of a sequence of stages, each transforming the data in some way. The data flows through these stages sequentially, with each stage receiving the output of the previous stage.

Basic Syntax:

db.collection.aggregate([
    { stage1 },
    { stage2 },
    { stage3 },
    ...
])
Enter fullscreen mode Exit fullscreen mode

Each stage is enclosed in {}, and multiple stages are passed in an array. The data is processed step by step, with each stage operating on the results of the previous one.

Key Stages in the Aggregation Pipeline

  • $match: Filters the data based on conditions (similar to SQL WHERE).
  • $group: Groups documents based on a field and performs aggregation operations (similar to SQL GROUP BY).
  • $project: Reshapes the data by selecting and renaming fields.
  • $sort: Sorts the documents in the specified order.
  • $limit: Limits the number of documents passed to the next stage.
  • $skip: Skips a specified number of documents.
  • $lookup: Performs a left outer join between two collections.

3. Aggregation Stages Explained

3.1 $match Stage

The $match stage filters documents based on a specified condition. It is equivalent to the WHERE clause in SQL.

Syntax:

{
  $match: { field: value }
}
Enter fullscreen mode Exit fullscreen mode

Example:

Filter documents where age is greater than 30:

db.users.aggregate([
  { $match: { age: { $gt: 30 } } }
])
Enter fullscreen mode Exit fullscreen mode

Output:

[
  { "_id": 1, "name": "Alice", "age": 35 },
  { "_id": 2, "name": "Bob", "age": 40 }
]
Enter fullscreen mode Exit fullscreen mode

3.2 $group Stage

The $group stage groups documents based on a specific field and allows you to perform aggregation operations such as sum, avg, count, etc.

Syntax:

{
  $group: {
    _id: <expression>,
    field1: { <operator>: <expression> },
    field2: { <operator>: <expression> }
  }
}
Enter fullscreen mode Exit fullscreen mode

Example:

Group users by their age and calculate the average salary for each age group:

db.users.aggregate([
  { 
    $group: {
      _id: "$age", // Group by age
      averageSalary: { $avg: "$salary" }
    }
  }
])
Enter fullscreen mode Exit fullscreen mode

Output:

[
  { "_id": 25, "averageSalary": 5000 },
  { "_id": 30, "averageSalary": 6000 },
  { "_id": 35, "averageSalary": 7000 }
]
Enter fullscreen mode Exit fullscreen mode

3.3 $project Stage

The $project stage reshapes each document by selecting and/or renaming fields, adding new fields, or excluding fields. It is similar to the SELECT clause in SQL.

Syntax:

{
  $project: {
    field1: 1,
    field2: 0,
    newField: { <expression> }
  }
}
Enter fullscreen mode Exit fullscreen mode

Example:

Project only the name and age fields, and create a new field ageInMonths:

db.users.aggregate([
  { 
    $project: {
      name: 1,
      age: 1,
      ageInMonths: { $multiply: ["$age", 12] }
    }
  }
])
Enter fullscreen mode Exit fullscreen mode

Output:

[
  { "name": "Alice", "age": 25, "ageInMonths": 300 },
  { "name": "Bob", "age": 30, "ageInMonths": 360 }
]
Enter fullscreen mode Exit fullscreen mode

3.4 $sort Stage

The $sort stage orders the documents in ascending or descending order based on a field or fields.

Syntax:

{
  $sort: { field: 1 }  // 1 for ascending, -1 for descending
}
Enter fullscreen mode Exit fullscreen mode

Example:

Sort users by age in descending order:

db.users.aggregate([
  { $sort: { age: -1 } }
])
Enter fullscreen mode Exit fullscreen mode

Output:

[
  { "_id": 2, "name": "Bob", "age": 30 },
  { "_id": 1, "name": "Alice", "age": 25 }
]
Enter fullscreen mode Exit fullscreen mode

3.5 $limit Stage

The $limit stage limits the number of documents passed to the next stage in the pipeline. This is useful for pagination and restricting the number of results.

Syntax:

{
  $limit: <number>
}
Enter fullscreen mode Exit fullscreen mode

Example:

Limit the result to 5 documents:

db.users.aggregate([
  { $limit: 5 }
])
Enter fullscreen mode Exit fullscreen mode

3.6 $skip Stage

The $skip stage skips a specified number of documents and passes the remaining documents to the next stage in the pipeline. This is useful for pagination purposes.

Syntax:

{
  $skip: <number>
}
Enter fullscreen mode Exit fullscreen mode

Example:

Skip the first 5 documents:

db.users.aggregate([
  { $skip: 5 }
])
Enter fullscreen mode Exit fullscreen mode

3.7 $lookup Stage (Join)

The $lookup stage is used to perform a left outer join between two collections. It is similar to SQL JOIN operations and allows you to combine data from multiple collections.

Syntax:

{
  $lookup: {
    from: "other_collection",  // The collection to join
    localField: "field_in_local_collection",  // Field from the local collection
    foreignField: "field_in_foreign_collection",  // Field from the foreign collection
    as: "output_field"  // The name of the field to store the results
  }
}
Enter fullscreen mode Exit fullscreen mode

Example:

Join users with orders:

db.users.aggregate([
  {
    $lookup: {
      from: "orders",  // The collection to join
      localField: "order_id",  // Field in `users` collection
      foreignField: "_id",  // Field in `orders` collection
      as: "order_details"  // Name of the new field in the output
    }
  }
])
Enter fullscreen mode Exit fullscreen mode

Output:

[
  {
    "_id": 1,
    "name": "Alice",
    "order_id": 101,
    "order_details": [
      { "_id": 101, "product": "Laptop" },
      { "_id": 102, "product": "Phone" }
    ]
  },
  {
    "_id": 2,
    "name": "Bob",
    "order_id": 102,
    "order_details": [{ "_id": 102, "product": "Phone" }]
  }
]
Enter fullscreen mode Exit fullscreen mode

Conclusion

MongoDB's Aggregation Framework provides a powerful and flexible way to manipulate and process data. By utilizing aggregation stages like $match, $group, $project, $sort, $limit, and $lookup, you can create complex data processing pipelines to achieve a wide variety of results.

Through the aggregation pipeline, you can filter, transform, group, and join data, providing a rich set of tools to extract meaningful insights from your MongoDB collections.

This guide covered the essential stages and operations in MongoDB’s aggregation framework, providing examples and explanations of how to use them effectively. Whether you're performing simple queries or complex data transformations, the aggregation framework is a core component of MongoDB that can help you achieve your goals efficiently.

Top comments (0)