DEV Community

Cover image for Transactions in Microservices: Part 3 - SAGA Pattern with Orchestration and Temporal.io.
Federico Bevione
Federico Bevione

Posted on

Transactions in Microservices: Part 3 - SAGA Pattern with Orchestration and Temporal.io.

In the second article of this series, we explored the Choreography approach to distributed transactions using RabbitMQ. Now, let’s shift our focus to the Orchestration approach, where a central orchestrator manages the entire workflow.

To make this practical, we’ll implement the same healthcare workflow but this time with Temporal.io, a robust orchestration platform for microservices. Each service will have its own logic, but the workflow coordination will be managed centrally.

What is SAGA Orchestration?

Orchestration centralizes the coordination of a distributed workflow. Instead of services emitting and consuming events autonomously, the orchestrator invokes service-specific logic and tracks progress.

Key Benefits:

  • Centralized Workflow Management: All logic resides in the orchestrator, making workflows easier to visualize and debug.
  • Stateful Workflows: The orchestrator manages state, enabling retries, timeouts, and compensations.
  • Error Handling: Built-in mechanisms for compensation and retries reduce boilerplate code.

Challenges:

  • Single Point of Failure: The orchestrator’s availability is critical.
  • Complexity in Orchestrator Logic: Centralizing workflow management can lead to complex orchestrator code.
  • Scalability: The orchestrator may become a bottleneck for high-throughput workflows.

To tackle these challenges, we need a reliable ally—one that can rise to the occasion and handle it all without adding complexity to our lives. Temporal, I choose you!


Who is Temporal?

Temporal is a powerful platform designed to handle complex, long-running workflows with ease. It offers several advantages over traditional approaches to distributed systems and state management.

Before diving into Temporal, it's worth briefly comparing it with other workflow orchestration solutions like AWS Step Functions and Simple Workflow Service (SWF). These tools also provide workflow management, but they have different trade-offs in terms of flexibility, scalability, and cloud dependency.

Comparing Temporal, AWS Step Functions, and SWF

Feature Temporal.io AWS Step Functions AWS SWF
Cloud Agnostic ✅ Yes ❌ No (AWS only) ❌ No (AWS only)
Developer Flexibility ✅ High (code-first workflows) ⚖️ Medium (declarative JSON-based) 🔽 Low (requires AWS SDK)
Scalability ✅ High (self-hosted or managed) ✅ High (AWS-managed) ⚖️ Medium (AWS-managed)
State Management ✅ Durable & Event-driven ⚖️ Managed state, less flexible ⚖️ Managed state but complex
Use Case Suitability 🚀 Microservices, multi-cloud, high-scale 🔹 AWS-native apps, quick automation 🔹 Legacy workflows, low complexity

AWS Step Functions and SWF are easier starting points for teams deeply integrated into AWS. Step Functions, in particular, offer a quick way to define workflows using a declarative JSON structure, making it an appealing option for those who need simple workflow coordination within AWS services.

Bridging the Gap: Why Not Just AWS Step Functions?

For teams deeply integrated into AWS, AWS Step Functions and/or SWF might seem like the default choice. However, as workflows grow in complexity, their limitations become evident:

  • Limited flexibility: Workflows are defined declaratively, making complex logic harder to maintain.

  • Tied to AWS: Step Functions are AWS-only, which can be restrictive for multi-cloud or hybrid environments.

  • State management constraints: While AWS manages state, it lacks the fine-grained control offered by Temporal.

While Step Functions and SWF work well for simple automation and orchestration within AWS, Temporal is a more powerful, long-term solution for microservices workflows.

Why Temporal for SAGA Orchestration?

The SAGA pattern requires a robust workflow orchestrator that can handle long-running processes, retries, and compensation logic. Temporal excels in this area because:

  • Code-first approach: Unlike Step Functions' JSON-based workflows, Temporal lets developers write workflows in code, making it easier to test, debug, and maintain.
  • Cloud independence: It works across different cloud environments, avoiding vendor lock-in.
  • Built-in durability: It provides out-of-the-box state persistence, ensuring long-running workflows remain reliable without extra effort.
  • Scalability & fault tolerance: Temporal’s architecture is built to scale horizontally and ensures workflows resume seamlessly even after failures.

Key Benefits:

  1. Stateful Workflow Management: Temporal handles workflow state persistence automatically, eliminating the need for external databases or custom solutions.
  2. Automatic Retries and Timeouts: Built-in support for retry policies and timeouts simplifies error handling and ensures reliability without writing additional code.
  3. Simplified Compensation: Temporal provides first-class support for SAGA patterns, making it easy to define and execute compensation logic when workflows fail.
  4. Language-Specific SDKs: Write workflows and activities in your preferred language, using Temporal's SDKs for Go, Java, Node.js, or Python.
  5. Scalability and Resilience: Temporal is designed to scale horizontally, handling thousands of workflows and activities concurrently with minimal configuration.
  6. Event-Driven Architecture: Temporal supports event-driven workflows, enabling seamless integration with other services and systems.
  7. Developer Productivity: Temporal abstracts away complexities like task scheduling, retries, and state management, allowing developers to focus on business logic.

Why Not Traditional Solutions?

  • Manual State Management: Traditional approaches often require maintaining workflow state in a database, which can be error-prone and hard to scale.
  • Complex Error Handling: Without Temporal, developers must implement custom retry logic and compensation mechanisms, increasing code complexity.
  • Tight Coupling: Temporal decouples workflow coordination from business logic, making your system more modular and maintainable.

In short, Temporal simplifies distributed application development by providing a robust, scalable, and developer-friendly framework for managing workflows and activities.

For more details, visit the Temporal documentation.


Practical Example: Healthcare Workflow

We’ll revisit the same healthcare workflow:

  1. Patient Service: Verifies patient details and insurance coverage.
  2. Scheduler Service: Schedules the procedure.
  3. Inventory Service: Reserves medical supplies.
  4. Billing Service: Processes billing.

The workflow will include:

  • Compensation Logic: In case of failures, services will undo their operations.
  • Retries: Temporal will retry failed operations automatically based on configured policies.

Setting Up Temporal.io

Deploying the Temporal stack can be challenging due to its requirement for multiple interconnected services. For production-ready environments, I strongly recommend leveraging Temporal Cloud, which simplifies management and ensures reliability by offloading the complexities of running the Temporal service. Additionally, Temporal Cloud offers a very competitive pricing model, making it accessible for projects of various scales.

However, if you enjoy tackling infrastructure challenges, the self-hosted guide provides detailed instructions for setting up Temporal on your own.

For this demo, we’ll keep things simple by using a minimal deployment with Docker Compose.

Start Temporal using Docker Compose:

   git clone https://github.com/fedricobevione/saga_tutorial.git
   docker compose --project-directory ./orchestration/compose/ -f orchestration/compose/docker-compose.yml up
Enter fullscreen mode Exit fullscreen mode

Implementation: Workflow and Activities

What is a Workflow?

A workflow in Temporal is a function that defines the coordination logic for a series of tasks or activities. It represents the overall business process, specifying the order in which activities are executed, how they interact, and how failures are handled.

Key Characteristics:
  1. Deterministic: Workflows must be deterministic, meaning their execution results should not depend on external factors like system time or random values. This ensures that workflows can be replayed consistently.
  2. Long-Running: Workflows can run for extended periods (e.g., days, months, or years) because Temporal automatically handles state persistence.
  3. Fault-Tolerant: Temporal workflows are resilient to failures. If a worker crashes or the system restarts, the workflow resumes from the last persisted state.
  4. Event-Driven: Workflows can respond to external signals and events, enabling real-time interaction with external systems.
  5. Scalable: Temporal workflows can scale horizontally to handle high loads and multiple concurrent executions.
How Workflows Work:
  • Workflows define the coordination logic, specifying how and when activities are executed.
  • Temporal ensures that workflows are durable, persisting their state after each decision task.
  • Developers implement workflows using Temporal's SDKs. For example:

What is an Activity?

An activity in Temporal is a function that performs a specific unit of work, typically involving external systems or resources. Activities are invoked by workflows and are responsible for the actual execution of business logic, such as interacting with APIs, databases, or file systems.

Key Characteristics:
  1. External Work: Activities perform tasks that may involve external systems or I/O operations.
  2. Retryable: Temporal handles automatic retries for failed activities based on configurable retry policies.
  3. Stateless: Activities are stateless and do not maintain persistent state between executions. This ensures they can be retried or executed on any available worker.
  4. Timeouts: Activities have configurable timeouts, ensuring that long-running or stuck tasks do not block the workflow indefinitely.
  5. Language-Specific SDKs: Activities are implemented using Temporal's SDKs (e.g., Go, Java, Node.js, Python).
How Activities Work:
  • Activities are invoked by workflows as part of the workflow execution.
  • Temporal schedules activity tasks and assigns them to workers for execution.
  • Upon completion, the activity's result is sent back to the workflow, which continues execution.

Define the Workflow

The orchestrator will define the sequence of activities.

// workflow.go
package workflow

import (
    "time"

    "github.com/thegoodapi/saga_tutorial/orchestration/activities"
    "go.temporal.io/sdk/temporal"
    "go.temporal.io/sdk/workflow"
    "go.uber.org/multierr"
)

func HealthcareWorkflow(ctx workflow.Context) error {
    logger := workflow.GetLogger(ctx)
    logger.Info("Starting Healthcare Workflow")

    // Define retry options for activities
    retryPolicy := &temporal.RetryPolicy{
        InitialInterval:    time.Second,
        BackoffCoefficient: 2.0,
        MaximumInterval:    time.Minute,
        MaximumAttempts:    5,
    }
    ctx = workflow.WithActivityOptions(ctx, workflow.ActivityOptions{
        StartToCloseTimeout: time.Minute,
        RetryPolicy:         retryPolicy,
    })

    // Step 1: Verify Patient
    logger.Info("Verifying Patient")
    err := workflow.ExecuteActivity(ctx, activities.VerifyPatientActivity).Get(ctx, nil)
    if err != nil {
        logger.Error("Patient verification failed", "Error", err)
        return err
    }

    // compensation for patient verification
    defer func() {
        if err != nil {
            errCompensation := workflow.ExecuteActivity(ctx, activities.NotifyProcedureScheduleCancellationActivity).Get(ctx, nil)
            err = multierr.Append(err, errCompensation)
        }
    }()

    // Step 2: Schedule Procedure
    logger.Info("Scheduling Procedure")
    err = workflow.ExecuteActivity(ctx, activities.ScheduleProcedureActivity).Get(ctx, nil)
    if err != nil {
        logger.Error("Procedure scheduling failed", "Error", err)
        return err
    }

    // compensation for procedure scheduling
    defer func() {
        if err != nil {
            errCompensation := workflow.ExecuteActivity(ctx, activities.CancelProcedureScheduleActivity).Get(ctx, nil)
            err = multierr.Append(err, errCompensation)
        }
    }()

    // Step 3: Reserve Supplies
    logger.Info("Reserving Supplies")
    err = workflow.ExecuteActivity(ctx, activities.ReserveSuppliesActivity).Get(ctx, nil)
    if err != nil {
        logger.Error("Supply reservation failed", "Error", err)
        return err
    }

    // compensation for supply reservation
    defer func() {
        if err != nil {
            errCompensation := workflow.ExecuteActivity(ctx, activities.ReleaseReservedSuppliesActivity).Get(ctx, nil)
            err = multierr.Append(err, errCompensation)
        }
    }()

    // Step 4: Process Billing
    logger.Info("Processing Billing")
    err = workflow.ExecuteActivity(ctx, activities.ProcessBillingActivity).Get(ctx, nil)
    if err != nil {
        logger.Error("Billing failed", "Error", err)
        return err
    }

    // compensation for billing
    defer func() {
        if err != nil {
            errCompensation := workflow.ExecuteActivity(ctx, activities.ProcessBillingCompensationActivity).Get(ctx, nil)
            err = multierr.Append(err, errCompensation)
        }
    }()

    logger.Info("Healthcare Workflow completed successfully")
    return nil
}

Enter fullscreen mode Exit fullscreen mode

Define Activities

Each activity encapsulates the logic for a service.

Verify Patient Activity
// patient.go
func VerifyPatientActivity() error {
    // Simulate logic
    fmt.Println("Step 1: Verifying patient...")
    return nil
}

func NotifyProcedureScheduleCancellationActivity() error {
    // Simulate logic
    fmt.Println("Compensation 1: Notify patient of verification failure.")
    return nil
}
Enter fullscreen mode Exit fullscreen mode
Schedule Procedure Activity
// scheduler.go
func ScheduleProcedureActivity() error {
    // Simulate scheduling logic
    fmt.Println("Step 2: Scheduling procedure...")
    return nil // or return an error to simulate failure
}

func CancelProcedureScheduleActivity() error {
    // Simulate cancellation logic
    fmt.Println("Compensation 2: Cancel procedure schedule.")
    return nil
}
Enter fullscreen mode Exit fullscreen mode
Reserve Supplies Activity
// inventory.go
func ReserveSuppliesActivity() error {
    // Simulate scheduling logic
    fmt.Println("Step 3: Reserving medical supplies...")
    return nil // or return an error to simulate failure
}

func ReleaseReservedSuppliesActivity() error {
    // Simulate cancellation logic
    fmt.Println("Compensation 3: Release reserved supplies.")
    return nil
}

Enter fullscreen mode Exit fullscreen mode
Process Billing Activity
// billing.go
func ReserveSuppliesActivity() error {
    // Simulate scheduling logic
    fmt.Println("Step 3: Reserving medical supplies...")
    return nil // or return an error to simulate failure
}

func ReleaseReservedSuppliesActivity() error {
    // Simulate cancellation logic
    fmt.Println("Compensation 3: Release reserved supplies.")
    return nil
}

Enter fullscreen mode Exit fullscreen mode

Create a worker

A Temporal worker is a process that executes workflows and activities in a Temporal application. It serves as the backbone for running the business logic defined in your workflows and activities.

Key Responsibilities:

  1. Executing Workflows: A worker processes the workflow code, including managing workflow state, handling signals, and executing timers.
  2. Running Activities: It performs activities, which are the core units of work that interact with external systems (e.g., databases, APIs, or services).
  3. Retry Logic: Workers handle retries for failed activities or workflows based on the retry policies defined in your code.
  4. Scalability: Multiple workers can be deployed to distribute the load of workflows and activities across different machines or instances.

How It Works:

  • Workers communicate with the Temporal service, which coordinates task scheduling and state management.
  • When a workflow or activity task is scheduled, the Temporal service assigns it to an available worker.
  • Workers listen for tasks, execute them, and report the results back to the Temporal service.

Key Features:

  • Fault Tolerance: Workers can crash or go offline without affecting workflow execution, as the Temporal service persists state and ensures tasks are reassigned to available workers.
  • Dynamic Scaling: New worker processes can be added to handle increased load dynamically.

In short, Temporal workers are essential for executing and scaling Temporal workflows and activities, providing the foundation for distributed and reliable task execution.

For more details, visit the Temporal documentation.

package main

import (
    "log"

    "github.com/thegoodapi/saga_tutorial/orchestration/activities"
    "github.com/thegoodapi/saga_tutorial/orchestration/workflow"
    "go.temporal.io/sdk/client"
    "go.temporal.io/sdk/worker"
)

func main() {
    // Create the client object just once per process
    c, err := client.Dial(client.Options{})
    if err != nil {
        log.Fatalln("unable to create Temporal client", err)
    }
    defer c.Close()
    // This worker hosts both Workflow and Activity functions
    w := worker.New(c, "healthcare-queue", worker.Options{})

    w.RegisterWorkflow(workflow.HealthcareWorkflow)
    w.RegisterActivity(activities.VerifyPatientActivity)
    w.RegisterActivity(activities.NotifyProcedureScheduleCancellationActivity)
    w.RegisterActivity(activities.ScheduleProcedureActivity)
    w.RegisterActivity(activities.CancelProcedureScheduleActivity)
    w.RegisterActivity(activities.ReserveSuppliesActivity)
    w.RegisterActivity(activities.ReleaseReservedSuppliesActivity)
    w.RegisterActivity(activities.ProcessBillingActivity)
    w.RegisterActivity(activities.ProcessBillingCompensationActivity)
    // Start listening to the Task Queue
    err = w.Run(worker.InterruptCh())
    if err != nil {
        log.Fatalln("unable to start Worker", err)
    }
}
Enter fullscreen mode Exit fullscreen mode

Create a starter

What is a Starter?

A starter in Temporal is a client application or process responsible for initiating a workflow execution. It acts as the entry point to Temporal workflows, sending a signal to the Temporal service to begin processing the workflow logic.

Key Characteristics:
  1. Workflow Trigger: The starter initiates a workflow by specifying its type, parameters, and execution options.
  2. Decoupled Invocation: A starter can be a standalone process or integrated into your application, enabling external systems or user actions to trigger workflows.
  3. Stateless: The starter itself does not manage the workflow’s state; it delegates this responsibility to the Temporal service.
  4. Configurable Options: Starters can set workflow-specific configurations like timeouts, IDs, and retry policies.
How a Starter Works:
  • The starter communicates with the Temporal service via Temporal’s client SDK.
  • It specifies the workflow type, input arguments, and optional execution parameters.
  • Once initiated, the Temporal service schedules the workflow, assigns it to a worker, and manages its lifecycle.
package main

import (
    "context"
    "log"

    "github.com/thegoodapi/saga_tutorial/orchestration/workflow"
    "go.temporal.io/sdk/client"
)

func main() {
    // Create the client object just once per process
    c, err := client.Dial(client.Options{})
    if err != nil {
        log.Fatalln("unable to create Temporal client", err)
    }
    defer c.Close()
    options := client.StartWorkflowOptions{
        ID:        "my-id",
        TaskQueue: "healthcare-queue",
    }

    ctx := context.Background()

    // Start the workflow
    we, err := c.ExecuteWorkflow(ctx, options, workflow.HealthcareWorkflow)
    if err != nil {
        log.Fatalln("error starting TransferMoney workflow", err)
    }

    // wait for workflow completion
    err = we.Get(ctx, nil)
    if err != nil {
        log.Fatalln("workflow error", err)
    }
}
Enter fullscreen mode Exit fullscreen mode

Running the Workflow

  1. Start Temporal Worker:
   go run orchestration/worker/worker.go
Enter fullscreen mode Exit fullscreen mode
  1. Start the Workflow:
   go run orchestration/start/start.go
Enter fullscreen mode Exit fullscreen mode
  1. Observe Output:
    Logs from the worker and Temporal Web UI will show the workflow execution progress.

  2. The magnificent UI:
    With your docker compose running, navigate to http://localhost:8080, you'll find a very usefull ui where you can easily debug your workflow executions

Temporal UI

Temporal UI - steps


Key Takeaways

  • Centralized Control: Temporal.io simplifies workflow coordination and state management.
  • Automatic Retries: Configurable retry policies reduce failure handling overhead.
  • Compensation Logic: Temporal makes it easy to implement and manage compensations.

What’s Next?

In the next article, we’ll compare Choreography and Orchestration approaches, diving into their strengths, weaknesses, and real-world use cases. We’ll help you understand when to choose one over the other based on your system's requirements, such as scalability, fault tolerance, and ease of debugging.

In future articles of this serie, we’ll also cover some key observability aspects to ensure you can effectively monitor and debug distributed workflows. This will include:

  • Logging: How to implement structured logging to trace workflow and activity execution.
  • Metrics: Using tools like Prometheus to track execution times, task queues, and retry rates.
  • Tracing: Leveraging distributed tracing (e.g., OpenTelemetry) to visualize workflow execution across services.

These insights will provide you with the knowledge to build more robust, scalable, and maintainable distributed systems.

Stay tuned for Part 4, where we bring it all together!

Check out the full repository for this series here. Have questions or feedback? Let’s discuss in the comments!

Top comments (0)