Introduction
The Saga pattern is a microservices architectural approach that coordinates a series of local transactions across different services through a sequence of events or commands. It aims to maintain data consistency without relying on a monolithic, long-running transaction. Instead, each microservice handles its own local database changes and publishes events to trigger subsequent steps.
Saga Theory
Why Avoid Two-Phase Commit (2PC)?
In a distributed environment, using 2PC can become overly complex, especially at scale. The Saga pattern counters this by breaking a single large transaction into a series of smaller local transactions. Each service:
- Performs a local transaction.
- Publishes an event (or sends a message).
- If a subsequent step fails, the pattern invokes compensating transactions to roll back prior steps.
Two Main Approaches
- Choreography: Each service produces and listens to domain events from other services.
- Orchestration: A centralized controller (the Orchestrator) calls each service in turn and manages rollbacks if something fails.
Either approach aims to ensure data consistency across services without a global lock or a single transaction manager.
Why Use the Saga Pattern?
- Data Consistency: Keep each service’s database in sync with the overall business process.
- Scalability: Each microservice can be scaled independently.
- Fault Tolerance: Isolate failure and roll back or compensate where necessary.
- Loose Coupling: Services interact via events or commands, reducing direct dependencies.
Example Setup
Consider a typical e-commerce workflow:
- Order Reservation Service (Service 1)
- Payment Service (Service 2)
- Shipment/Inventory Service (Service 3)
Each service uses:
- AWS Lambda for compute.
- Amazon SQS for inter-service communication.
- Its own database (e.g., DynamoDB or PostgreSQL).
DynamoDB vs. PostgreSQL
For illustration:
- Order Reservation Service uses DynamoDB.
- Payment Service uses RDS (PostgreSQL).
- Shipment/Inventory Service also uses RDS (PostgreSQL).
Flow Overview
1. Customer Request
- The customer triggers an API call (via API Gateway) to place an order.
2. Order Reservation Service (DynamoDB)
- Triggered by the API Gateway call (runs on AWS Lambda).
- Persists the order record to its DynamoDB table.
- Returns a “reservation pending” response to the customer.
- Sends an SQS message to the Payment Service.
3. Payment Service (PostgreSQL)
- Triggered by the SQS message.
- Attempts to process payment (updates a PostgreSQL DB).
- On success, sends an SQS message to Shipment/Inventory Service.
- On failure, sends a rollback request to the Order Reservation Service.
4. Shipment/Inventory Service (PostgreSQL)
- Triggered by the SQS message from Payment.
- Confirms shipping and updates inventory (PostgreSQL).
- On success, notifies Order Reservation Service of completion.
- On failure, triggers compensating actions: Payment is reversed, Order Reservation is updated to “Canceled.”
Handling Transactions and Rollbacks
DynamoDB (Order Reservation)
- Local Transactions: DynamoDB supports Transaction Write Items, allowing an atomic batch of writes/updates/deletes.
-
Rollback Strategy:
- Use a subsequent transactional write (or delete) to revert the item.
- Conditional checks to ensure you only rollback items in a pending state.
- Idempotency can be managed using item versioning or conditional expressions.
PostgreSQL (Payment, Shipment/Inventory)
-
Local Transactions: PostgreSQL provides standard SQL transactions with
BEGIN
,COMMIT
,ROLLBACK
. -
Rollback Strategy:
- If failure occurs during an open transaction, ROLLBACK.
- If the transaction has already committed (e.g., after a credit card charge), a compensating transaction (like a refund) might be necessary.
Detailed Scenarios
1. All Services Succeed
- Customer places an order (API Gateway → Lambda).
- Order Reservation writes to DynamoDB.
- Payment processes the payment with a PostgreSQL local transaction.
- Shipment/Inventory updates inventory in its PostgreSQL transaction.
- Order Reservation finalizes the order after receiving a success message.
2. Failure in Payment Service
- Order Reservation completes successfully and sends a message to Payment.
- Payment fails (e.g., insufficient funds).
- Payment either rolls back the open transaction or issues a compensating refund.
- Payment notifies Order Reservation.
- Order Reservation updates DynamoDB to mark the order as “Canceled.”
3. Failure in Shipment/Inventory
- Order Reservation and Payment succeed.
- Shipment/Inventory fails (DB error, out of stock, etc.).
- Shipment/Inventory rolls back the transaction if still open.
- Notifies Payment (for refund if payment was already captured) and Order Reservation (to cancel the order).
4. Other Potential Failures
- Timeouts: If Shipment/Inventory never responds, Order Reservation can define a timeout strategy.
- Partial Failures: Payment might succeed but fail to write to its DB. Idempotency is critical to handle retries.
- Dead Letter Queues (DLQs): Repeatedly failing messages can land in a DLQ for manual review.
Error Handling and Compensation
- Local Rollbacks: Each service can undo its changes if notified of an upstream failure.
- DynamoDB: Use a follow-up transactional write or delete.
- PostgreSQL: Use ROLLBACK or a compensating UPDATE/DELETE if already committed.
- Orchestration Logic: In the Orchestration approach, the orchestrator (e.g., the Order Reservation service) can track the overall saga state, sending compensating commands when needed.
- Choreography: Each service publishes domain events for the others to consume.
Best Practices
- Idempotency: Ensure repeated messages or API calls don’t create duplicate transactions.
- Monitoring & Alerting: Use Amazon CloudWatch metrics (errors, queue size, latencies).
- Security: Employ least-privilege IAM roles for each microservice.
- Automated Tests: Integration, partial failure, and timeout tests are crucial for confidence.
- DynamoDB Transactions: Use Transaction Write for atomic updates when you need them.
- PostgreSQL ACID: Leverage strong consistency within each service boundary.
Conclusion
The Saga pattern is a powerful strategy for orchestrating and maintaining data consistency in distributed microservices. By structuring each step as a local transaction and employing events or an orchestrator, you avoid the pitfalls of large, complex global transactions. Whether you use DynamoDB transactions or PostgreSQL’s ACID properties, the key lies in designing robust compensating mechanisms for failures. With careful planning — including proper rollbacks, timeouts, and idempotent messages — the Saga pattern can reliably handle complex workflows such as e-commerce order processing.
Thanks for reading! If you enjoyed this guide, feel free to drop a comment or share how you implement Sagas in your own applications.
Top comments (0)