DEV Community

Cover image for Transactions in Microservices: Part 2 - SAGA Pattern with Choreography
Federico Bevione
Federico Bevione

Posted on

Transactions in Microservices: Part 2 - SAGA Pattern with Choreography

In the first article of this series, we introduced the SAGA pattern and demonstrated how a minimal Orchestration can manage distributed transactions with a central orchestrator.

Let’s get real! This time, we’ll dive into the Choreography approach, where services coordinate workflows by autonomously emitting and consuming events.

To make this practical, we’ll implement a multi-service healthcare workflow using Go and RabbitMQ. Each service will have its own main.go, making it easy to scale, test, and run independently.

What is SAGA Choreography?

Choreography relies on decentralized communication. Each service listens for events and triggers subsequent steps by emitting new events. There’s no central orchestrator; the flow emerges from the interactions of individual services.

Key Benefits:

  • Decoupled Services: Each service operates independently.
  • Scalability: Event-driven systems handle high loads efficiently.
  • Flexibility: Adding new services doesn’t require changing the workflow logic.

Challenges:

  • Debugging Complexity: Tracking events across multiple services can be tricky. (I'll write an article dedicated to this topic, stay tuned!)
  • Infrastructure Setup: Services require a robust message broker (e.g., RabbitMQ) to connect all the dots.
  • Event Storms: Poorly designed workflows can overwhelm the system with events.

Practical Example: Healthcare Workflow

Let’s revisit our healthcare workflow from the first article:

  1. Patient Service: Verifies patient details and insurance coverage.
  2. Scheduler Service: Schedules the procedure.
  3. Inventory Service: Reserves medical supplies.
  4. Billing Service: Processes billing.

Each service will:

  • Listen for specific events using RabbitMQ.
  • Emit new events to trigger subsequent steps.

Setting Up RabbitMQ with Docker

We’ll use RabbitMQ as the event queue. Run it locally using Docker:

docker run --rm --name rabbitmq -p 5672:5672 -p 15672:15672 rabbitmq:4.0.5-management
Enter fullscreen mode Exit fullscreen mode

Access the RabbitMQ management interface at http://localhost:15672 (username: guest, password: guest).

Exchanges, Queues, and Bindings Setup

We need to configure RabbitMQ to accommodate our events. Here’s an example init.go file for setting up the RabbitMQ infrastructure:

package main

import (
    "log"

    "github.com/rabbitmq/amqp091-go"
)

func main() {
    conn, err := amqp091.Dial("amqp://guest:guest@localhost:5672/")
    if err != nil {
        log.Fatalf("Failed to connect to RabbitMQ: %v", err)
    }
    defer conn.Close()

    ch, err := conn.Channel()
    if err != nil {
        log.Fatalf("Failed to open a channel: %v", err)
    }
    defer ch.Close()

    err = ch.ExchangeDeclare("events", "direct", true, false, false, false, nil)
    if err != nil {
        log.Fatalf("Failed to declare an exchange: %v", err)
    }

    _, err = ch.QueueDeclare("PatientVerified", true, false, false, false, nil)
    if err != nil {
        log.Fatalf("Failed to declare a queue: %v", err)
    }

    err = ch.QueueBind("PatientVerified", "PatientVerified", "events", false, nil)
    if err != nil {
        log.Fatalf("Failed to bind a queue: %v", err)
    }
}
Enter fullscreen mode Exit fullscreen mode

Full code here!

Note: In a production setting, you might want to manage this setup using a GitOps approach (e.g., with Terraform) or let each service handle its own queues dynamically.

Implementation: Service Files

Each service will have its own main.go. We’ll also include compensation actions for handling failures gracefully.

1. Patient Service

This service verifies patient details and emits a PatientVerified event. It also compensates by notifying the patient if a downstream failure occurs.

// patient/main.go
package main

import (
    "fmt"
    "log"

    "github.com/rabbitmq/amqp091-go"
    "github.com/thegoodapi/saga_tutorial/choreography/common"
)

func main() {
    conn, err := amqp091.Dial("amqp://guest:guest@localhost:5672/")
    if err != nil {
        log.Fatalf("Failed to connect to RabbitMQ: %v", err)
    }
    defer conn.Close()

    ch, err := conn.Channel()
    if err != nil {
        log.Fatalf("Failed to open a channel: %v", err)
    }
    defer ch.Close()

    go func() {
        fmt.Println("[PatientService] Waiting for events...")
        msgs, err := common.ConsumeEvent(ch, "ProcedureScheduleCancelled")
        if err != nil {
            log.Fatalf("Failed to consume event: %v", err)
        }

        for range msgs {
            fmt.Println("[PatientService] Processing event: ProcedureScheduleCancelled")
            if err := notifyProcedureScheduleCancellation(); err != nil {
                log.Fatalf("Failed to notify patient: %v", err)
            }
        }
    }()

    common.PublishEvent(ch, "events", "PatientVerified", "Patient details verified")
    fmt.Println("[PatientService] Event published: PatientVerified")

    select {}
}

func notifyProcedureScheduleCancellation() error {
    fmt.Println("Compensation: Notify patient of procedure cancellation.")
    return nil
}
Enter fullscreen mode Exit fullscreen mode

2. Scheduler Service

This service listens for PatientVerified and emits ProcedureScheduled. It compensates by canceling the procedure if a downstream failure occurs.

// scheduler/main.go
package main

import (
    "fmt"
    "log"

    "github.com/rabbitmq/amqp091-go"
    "github.com/thegoodapi/saga_tutorial/choreography/common"
)

func main() {
    conn, err := amqp091.Dial("amqp://guest:guest@localhost:5672/")
    if err != nil {
        log.Fatalf("Failed to connect to RabbitMQ: %v", err)
    }
    defer conn.Close()

    ch, err := conn.Channel()
    if err != nil {
        log.Fatalf("Failed to open a channel: %v", err)
    }
    defer ch.Close()

    go func() {
        fmt.Println("[SchedulerService] Waiting for events...")
        msgs, err := common.ConsumeEvent(ch, "PatientVerified")
        if err != nil {
            log.Fatalf("Failed to consume event: %v", err)
        }

        for range msgs {
            fmt.Println("[SchedulerService] Processing event: PatientVerified")
            if err := scheduleProcedure(); err != nil {
                common.PublishEvent(ch, "events", "ProcedureScheduleFailed", "Failed to schedule procedure")
                fmt.Println("[SchedulerService] Compensation triggered: ProcedureScheduleFailed")
            } else {
                common.PublishEvent(ch, "events", "ProcedureScheduled", "Procedure scheduled successfully")
                fmt.Println("[SchedulerService] Event published: ProcedureScheduled")
            }
        }
    }()

    select {}
}

func scheduleProcedure() error {
    fmt.Println("Step 2: Scheduling procedure...")
    return nil // or simulate a failure
}
Enter fullscreen mode Exit fullscreen mode

Additional Services

Include Inventory Service and Billing Service implementations, following the same structure as above. Each service listens for the previous event and emits the next one, ensuring compensation logic is in place for failures.

Full code here!


Running the Workflow

Start RabbitMQ:

   docker run --rm --name rabbitmq -p 5672:5672 -p 15672:15672 rabbitmq:4.0.5-management
Enter fullscreen mode Exit fullscreen mode

Run Each Service:
Open separate terminals and run:

   // one-time script to setup rabbitmq
   go run choerography/init/main.go
   // services
   go run choerography/billing/main.go
   go run choerography/inventory/main.go
   go run choerography/scheduler/main.go
   go run choerography/patient/main.go
Enter fullscreen mode Exit fullscreen mode

Observe Output:
Each service processes events in sequence, logging the workflow progress.

What happened?

Let's break it down!

First of all, for the purpose of this article, we are not implementing SuppliesReserveFailed and ProcedureScheduleFailed,l to avoid unseless complexity.

We are implementing the following events

Steps (or transactions):

  • T1: (init): PatientVerified
  • T2: ProcedureScheduled
  • T3: SuppliesReserved
  • T4: BillingSuccessful

Compensations:

  • C4: BillingFailed
  • C3: ReservedSuppliesReleased
  • C2: ProcedureScheduleCancelled
  • C1: NotifyFailureToUser (not implemented)

Folowing this implementation diagram

high-level implementation flow

This diagram represents a common approach to documenting choreography. However, I find it somewhat difficult to understand and a bit frustrating, particularly for those who are not familiar with the implementation or the pattern.

Let's break it down!

detailed implementation flow

The diagram above is way more verbose and it breaks down each step making it easier to understand what's going on.

In a nutshell:

  1. Patient service verifies patient details successfully
  2. Patient service emits PatientVerified
  3. Scheduler service consumes PatientVerified
  4. Scheduler service schedule the appintment successfully
  5. Scheduler service emits ProcedureScheduled
  6. Inventory service consumes ProcedureScheduled
  7. Inventory service reserves the supplies successfully
  8. Inventory service emits SuppliesReserved
  9. Billing service consumes SuppliesReserved
  10. Billing service failes to charge the customer and starts the compensation
  11. Billing service emits BillingFailed
  12. Inventory service consumes BillingFailed
  13. Inventory service releases the supplies, reserved in step 7
  14. Inventory service emits ReservedSuppliesReleased
  15. Scheduler service consumes ReservedSuppliesReleased
  16. Scheduler service deletes the appointment scheduled in step 4
  17. Scheduler service emits ProcedureScheduleCancelled
  18. Patient service consumes ProcedureScheduleCancelled
  19. Patient service notifies the customer of the error

Note that we are not implementing failures for steps 1, 4, and 7 for the sake of brevity; however, the approach would be the same. Each of these failures would trigger a rollback of the preceding steps.


Observability

Observability is essential for debugging and monitoring distributed systems. Implementing logs, metrics, and traces ensures that developers can understand system behavior and diagnose issues efficiently.

Logging

  • Use structured logging (e.g., JSON format) to capture events and metadata.
  • Include correlation IDs in logs to trace workflows across services.

Metrics

  • Monitor queue sizes and event processing times.
  • Use tools like Prometheus to collect and visualize metrics.

Tracing

  • Implement distributed tracing (e.g., with OpenTelemetry) to track events across services.
  • Annotate spans with relevant data (e.g., event names, timestamps) for better insights.

We'll dive into observability in choerography later in this serie, stay tuned!


Key Takeaways

  • Decentralized Control: Choreography enables autonomous collaboration.
  • Event-Driven Simplicity: RabbitMQ simplifies message exchange.
  • Scalable Architecture: Adding new services is seamless.
  • Choerography can be very overwelming at first, but as always: practice make you perfect better!

Stay tuned for the next article, where we’ll explore Orchestration!

Check out the full repository for this series here. Let’s discuss in the comments!

Top comments (0)