1. Introduction
Sagas are enterprise integration patterns used in event-sourcing architectures. They are useful when resilience, load capacity, and performance are important. Telecom
companies use this a lot in order to pass data around in order to keep important metrics such as the duration of a request, the number of simultaneous requests possible, how responsive the application is, and how resilient it is compliant with SLAs
(Service Level Agreements). This pattern was first brought to the public in 1987
by Hector Garcia-Molina
and Kenneth Salem
. This idea came to mind to solve a problem related to LLTs
(Long-Lived Transactions). These are transactions that are thought out to be A.C.I.D.
, but took an extended amount of time. The keywords here are Atomic, Consistent, Isolation, and Durability. Doing this for transactions that consumed a lot of resources, would cause LLTs to last very long, even days or weeks. This obviously presented a latency problem and also excessive resource consumption. By splitting a transaction, it would not be possible to comply with Isolation
or Atomicity
. The risks of losing consistency would also be high. In order to leverage the LLTs and transform them into manageable transactions and release resources as soon as possible, they designed a system, with two variants, with a defined layout plan to perform specific operations in reaction to failure. In an LLT, this would be the rollback. In Sagas, these are intelligent processes capable of rollback or bringing the system to the desired state, should any of the sub-processes fail. These sub-processes are still Transactions and they are strictly compliant with A.C.I.D.
paradigms. We will explore the example I’ve created in GitHub
.
2. Case
We want to store in our database, all the news from a particular news feed. Then we want to allow users to comment on them and we want to control all error flows we may encounter. We also want to shorten response times as much as possible, provide High Availability, and resiliency and make sure that users get their comments through as fast as possible.
3. Project layout
For this project, we are going to use the eventuate framework, combined with the spring framework:
In the schema, we can see two important main sections. The first is composed of the Fetcher and the Mock
Feed
. These two provide raw data to our Saga architecture. As a default run, I’ve set the Fetcher to run every minute for a maximum of 30 seconds until it gets 100 news messages. This means that either it will complete 30 seconds with under 100 messages or it will complete earlier with a total of 100 messages. Since this is not an online social media platform or real news feed, you will always get 100 messages in this example. It is not the goal of this article to demonstrate how this works, but I think it is important that you get an idea that behind all of this, there are three running threads. One is responsible to keep checking a Queue for incoming messages. The second is responsible for making requests to the mock news feed. Finally, the third one will stop the whole process if 30 seconds have already elapsed.
Once we have picked up our data once, we can continue exploring sagas. The fetcher process will continue indefinitely. This is the second part of the diagram.
Our saga implementations will consume payloads like the one in this example:
{
"idPage": 1,
"pageComment": "I love this",
"idAuthor": 2,
"authorComment": "This is my favourite author",
"idMessage": 3,
"messageComment": "I agree",
"authorRequestId": 123,
"pageRequestId": 456,
"messageRequestId": 789
}
What’s important to know about this payload is the idPage
, idAuthor, and idMessage. These IDs
are not programmed to be foreign keys, and so when sending comments to be attached to pages, authors, and messages, they should be able to match existing data. If not, the data will be registered as not available. There is a hierarchy and that is Page->Author->Message
. For example, if the Author
does not exist, the Message
comment will not be recorded, but Page
and Author
comments will, and they will be marked as not available. This sort of case of course does not really exist in reality. This is just a made-up case to show how Saga can be used to our benefit.
4. Saga in practice
Let’s focus in detail on the goal of this article. We want to see how sagas can work for us. Sagas are also a way to decouple the client request from the actual processing. The client makes a POST request and the Saga will make sure that the request gets to the database. If however, something should fail, then the Saga must have the intelligence to perform rollbacks or other predictable actions.
From the outside, we can see that the decoupling is provided via a Streaming engine. In our case, we will use Kafka. For streaming purposes, we can use whatever mechanism we want. It is not mandatory to use Kafka. Please check the eventuate.io website to find out more about other compatible mechanisms. A better to visualize what will happen behind the curtain is to look into this sequence diagram.
In this diagram, we can see that when we make a request, using any of the two Saga
types described, our request will go to a database. It gets persisted and the only way to continue is to ship it to a stream. In our example, it goes to a Kafka
stream. The eventuate team has created a CDC
service, which does this. However, for the purpose of this example, it was shown to be quite complicated to manage and this is why I created my own CDC-mocked
version. Essentially it picks the data from a table called messages and sends the non-published ones exactly as they are into the Kafka streams. We’ll see later in this article how this works in detail. Once the CDC
picks the messages and ships them into Kafka
, our Saga
code will pick it up in another thread and continue executing the Saga. At this point, our user has already received a 200 OK meaning that the message is being handled. Finally, if we check the database for comments, we will see the results according to what was sent. Maybe we'll see comments marked as not available or maybe we’ll see comments completely and correctly handled.
4.1. Eventuate CDC Service
The implementation of the CDC service is nothing more than an implementation of the Kafka client. For that we create a KafkaProducerFactory
:
class KafkaProducerFactory {
companion object {
fun createProducer(brokers: String): Producer<Long?, String?> {
val props = Properties()
props[ProducerConfig.BOOTSTRAP_SERVERS_CONFIG] = brokers
props[ProducerConfig.CLIENT_ID_CONFIG] = CdcConstants.CLIENT_ID
props[ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG] = LongSerializer::class.java.name
props[ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG] = StringSerializer::class.java.name
return KafkaProducer(props)
}
}
}
The message content and shape, really do depend on what is being registered in the database. For the Eventuate
Saga
implementation, we first need to create a database. The scripts for this database are available on their website, in many locations. I’ve summed them all up here:
-- from:
-- https://github.com/eventuate-tram/eventuate-tram-sagas/blob/master/postgres/tram-saga-schema.sql
CREATE SCHEMA IF NOT EXISTS eventuate;
DROP Table IF Exists eventuate.saga_instance_participants;
DROP Table IF Exists eventuate.saga_instance;
DROP Table IF Exists eventuate.saga_lock_table;
DROP Table IF Exists eventuate.saga_stash_table;
drop table if exists eventuate.message;
drop table if exists eventuate.received_messages;
drop table if exists eventuate.cdc_monitoring;
CREATE TABLE eventuate.saga_instance_participants
(
saga_type VARCHAR(255) NOT NULL,
saga_id VARCHAR(100) NOT NULL,
destination VARCHAR(100) NOT NULL,
resource VARCHAR(100) NOT NULL,
PRIMARY KEY (saga_type, saga_id, destination, resource)
);
CREATE TABLE eventuate.saga_instance
(
saga_type VARCHAR(255) NOT NULL,
saga_id VARCHAR(100) NOT NULL,
state_name VARCHAR(100) NOT NULL,
last_request_id VARCHAR(100),
end_state BOOLEAN,
compensating BOOLEAN,
saga_data_type VARCHAR(1000) NOT NULL,
saga_data_json VARCHAR(1000) NOT NULL,
PRIMARY KEY (saga_type, saga_id)
);
create table eventuate.saga_lock_table
(
target VARCHAR(100) PRIMARY KEY,
saga_type VARCHAR(255) NOT NULL,
saga_Id VARCHAR(100) NOT NULL
);
create table eventuate.saga_stash_table
(
message_id VARCHAR(100) PRIMARY KEY,
target VARCHAR(100) NOT NULL,
saga_type VARCHAR(255) NOT NULL,
saga_id VARCHAR(100) NOT NULL,
message_headers VARCHAR(1000) NOT NULL,
message_payload VARCHAR(1000) NOT NULL
);
-- from
-- https://github.com/eventuate-tram/eventuate-tram-core/blob/master/eventuate-tram-in-memory/src/main/resources/eventuate-tram-embedded-schema.sql
CREATE TABLE eventuate.message
(
ID VARCHAR(1000) PRIMARY KEY,
DESTINATION VARCHAR(1000) NOT NULL,
HEADERS VARCHAR(1000) NOT NULL,
PAYLOAD VARCHAR(1000) NOT NULL,
CREATION_TIME BIGINT,
PUBLISHED BIGINT
);
CREATE TABLE eventuate.received_messages
(
CONSUMER_ID VARCHAR(1000),
MESSAGE_ID VARCHAR(1000),
CREATION_TIME BIGINT,
PRIMARY KEY (CONSUMER_ID, MESSAGE_ID)
);
create table eventuate.cdc_monitoring
(
reader_id VARCHAR(1000) PRIMARY KEY,
last_time BIGINT
);
If we take a good look at the Message
table, we can see all the important fields for the CDC
payload to Kafka
. We need ID
, which is used internally by the eventuate framework, the headers, the payload, and we use published to determine if the message has already been sent to kafka or not. I chose 0 for not and 1 for having been sent. This way we can easily create our kafka
client using the apache kafka
libraries:
@SpringBootApplication
@EnableScheduling
open class CdcProcessLauncer(
private val messageRepository: MessageRepository,
@Value("\${org.jesperancinha.newscast.host.kafka.brokers}")
private val brokers: String
) {
private val producer = KafkaProducerFactory.createProducer(brokers)
@Scheduled(cron = "0/5 * * ? * *")
fun fetchAndPublish() {
messageRepository.findAllByPublishedIs(0).forEach {
val objectMapper = ObjectMapper()
val headers = objectMapper.readTree(it.headers)
val command = KafkaCommand(it.payload, headers)
val commandPayload = objectMapper.writeValueAsString(command)
val record = ProducerRecord<Long?, String?>(it.destination, commandPayload)
producer.send(record).get()
messageRepository.save(it.copy(published = 1))
println("Sent: $commandPayload")
}
}
companion object {
@JvmStatic
fun main(args: Array<String>) {
SpringApplication.run(CdcProcessLauncer::class.java, *args)
}
}
}
4.2. Saga Choreography
A Saga choreography is very much dependent on events and event handlers. There is usually no single defined structure on how code is supposed to intervene:
In the diagram above we see that we have different events, which wrap the same type. This is the NewsCastComments:
data class NewsCastComments(
val idPage: Long? = null,
val pageComment: String? = null,
val idAuthor: Long? = null,
val authorComment: String? = null,
val idMessage: Long? = null,
val messageComment: String? = null,
var authorRequestId: Long? = null,
var pageRequestId:Long? = null,
var messageRequestId:Long? = null
)
If we want our chain to act as a Saga
, it needs to share the same payload. Think of this as a recipe, where you create an ingredient, and you let it flow through your recipe. It will never leave the recipe, it can be modified but it will be there until the end. That whole schema above can be simplified in the following code:
class NewsCastEventConsumer(
private val domainEventPublisher: DomainEventPublisher,
private val newsCasePageCommentService: NewsCastPageCommentService,
private val newsCastAuthorCommentService: NewsCastAuthorCommentService,
private val newsCastMessageCommentService: NewsCastMessageCommentService,
private val pageService: PageService,
private val authorService: AuthorService,
private val messageService: MessageService,
) {
private val logger = KotlinLogging.logger {}
fun domainEventHandlers(): DomainEventHandlers {
return DomainEventHandlersBuilder
.forAggregateType("org.jesperancinha.newscast.saga.data.NewsCastComments")
.onEvent(NewsCastEvent::class.java, ::handleCreateNewsCastCommentEvent)
.onEvent(NewsCastPageCommentEvent::class.java, ::handleCreatePageCommentEvent)
.onEvent(NewsCastPageRejectCommentEvent::class.java, ::handleRejectPageCommentEvent)
.onEvent(NewsCastAuthorCommentEvent::class.java, ::handleCreateAuthorCommentEvent)
.onEvent(NewsCastAuthorRejectCommentEvent::class.java, ::handleRejectAuthorCommentEvent)
.onEvent(NewsCastMessageCommentEvent::class.java, ::handleCreateMessageCommentEvent)
.onEvent(NewsCastMessageRejectCommentEvent::class.java, ::handleRejectMessageCommentEvent)
.onEvent(NewsCastDoneEvent::class.java, ::handleDone)
.build()
}
...
}
4.3. Saga Orchestration
A Saga orchestration has a very different shape but is quite similar to a Saga choreography. In the previous case, all events and handlers have to be very well choreographed with each other. Essentially this is why it is called that way. A handler needs to know which event to send in each circumstance. In the case of the Saga orchestration, there is a plan forward and a plan backward. There is no complicated way to define rollbacks.
In the example above, we see that when we move forward in processing our data using different participants, we go through different handlers. In this case, they are also triggered, but instead of by events, they get called by commands. It’s just another name for something that almost does the same.
class CreateCommentSaga : SimpleSaga<NewsCastComments> {
private val logger = KotlinLogging.logger {}
private val sagaDefinition = this.step()
.invokeLocal(this::startSaga)
.step()
.invokeParticipant(this::recordPageComment)
.onReply(PageComment::class.java, this::savedPageComment)
.withCompensation(this::rejectPageComment)
.onReply(PageComment::class.java, this::rejectedPageComment)
.step()
.invokeParticipant(this::recordAuthorComment)
.onReply(AuthorComment::class.java, this::savedAuthorComment)
.withCompensation(this::rejectAuthorComment)
.onReply(AuthorComment::class.java, this::rejectedAuthorComment)
.step()
.invokeParticipant(this::recordMessageComment)
.onReply(MessageComment::class.java, this::savedMessageComment)
.withCompensation(this::rejectMessageComment)
.onReply(MessageComment::class.java, this::rejectedMessageComment)
.step()
.invokeLocal(this::done)
.build()
private fun startSaga(newsCastComments: NewsCastComments) = logger.info("Saga has started: $newsCastComments")
private fun recordPageComment(newsCastComments: NewsCastComments): CommandWithDestination =
send(NewsCastPageCommand(
idPage = newsCastComments.idPage,
requestId = newsCastComments.pageRequestId,
comment = newsCastComments.pageComment
)).to("pageChannel").build()
...
}
5. Running the example
In order to run this example and test how everything works please run:
make docker-clean-build-start
This command will take a while, it will build the whole project, prepare the binaries for the docker image and start all the necessary containers. For quick reference this is the docker-compose file used:
networks:
newscast:
services:
news_cast_postgres:
hostname: news_cast_postgres
container_name: news_cast_postgres
command: -c 'max_connections=400' -c 'shared_buffers=100MB'
build:
context: ./docker-files/docker-psql/.
environment:
- POSTGRES_USER=postgres
- POSTGRES_PASSWORD=admin
- POSTGRES_MULTIPLE_DATABASES=ncexplorer,eventuate
networks:
- newscast
deploy:
resources:
limits:
memory: 200M
reservations:
memory: 200M
healthcheck:
test: [ "CMD", "pg_isready", "-U", "postgres" ]
interval: 30s
timeout: 30s
retries: 10
start_period: 0s
news_cast_kafka:
hostname: news_cast_kafka
container_name: news_cast_kafka
build:
context: ./docker-files/kafka/.
deploy:
resources:
limits:
memory: 1000M
reservations:
memory: 1000M
networks:
- newscast
depends_on:
news_cast_postgres:
condition: service_healthy
news_cast_mock:
hostname: news_cast_mock
container_name: news_cast_mock
build:
context: news-cast-mock/.
restart: on-failure
networks:
- newscast
deploy:
resources:
limits:
memory: 400M
reservations:
memory: 400M
depends_on:
news_cast_postgres:
condition: service_healthy
news_cast_cdc:
hostname: news_cast_cdc
container_name: news_cast_cdc
build:
context: news-cast-explorer-cdc/.
restart: on-failure
deploy:
resources:
limits:
memory: 300M
reservations:
memory: 300M
networks:
- newscast
depends_on:
news_cast_postgres:
condition: service_healthy
news_cast_fetcher:
hostname: news_cast_fetcher
container_name: news_cast_fetcher
build:
context: news-cast-explorer-fetcher/.
deploy:
resources:
limits:
memory: 200M
reservations:
memory: 200M
networks:
- newscast
depends_on:
news_cast_postgres:
condition: service_healthy
healthcheck:
test: ["CMD", "curl", "--silent", "http:/127.0.0.1:8080/api/newscast/fetcher/actuator"]
interval: 5s
timeout: 240s
retries: 60
news_cast_choreography:
hostname: news_cast_choreography
container_name: news_cast_choreography
build:
context: news-cast-explorer-saga-choreography/.
restart: on-failure
deploy:
resources:
limits:
memory: 300M
reservations:
memory: 300M
networks:
- newscast
depends_on:
news_cast_postgres:
condition: service_healthy
news_cast_orchestration:
hostname: news_cast_orchestration
container_name: news_cast_orchestration
build:
context: news-cast-explorer-saga-orchestration/.
restart: on-failure
deploy:
resources:
limits:
memory: 300M
reservations:
memory: 300M
networks:
- newscast
depends_on:
news_cast_postgres:
condition: service_healthy
news_cast_fe:
hostname: news_cast_fe
container_name: news_cast_fe
build:
context: docker-files/nginx/.
restart: on-failure
deploy:
resources:
limits:
memory: 300M
reservations:
memory: 300M
networks:
- newscast
depends_on:
news_cast_fetcher:
condition: service_healthy
Once everything has started, please go to http://localhost:9000. You will find a page like this:
Once you’ve done this, you can test the different choreography types:
- Choreography — Port 8082:
curl -X POST http://localhost:8082/api/saga/orchestration -H 'Content-Type: application/json' --data '{ "idPage": 1, "pageComment": "I love this", "idAuthor": 2, "authorComment": "This is my favourite author", "idMessage": 3, "messageComment": "I agree", "authorRequestId":123,"pageRequestId":456,"messageRequestId":789 }'
- Orchestration — Port 8083:
curl -X POST http://localhost:8083/api/saga/choreography -H 'Content-Type: application/json' --data '{ "idPage": 1, "pageComment": "I love this", "idAuthor": 2, "authorComment": "This is my favourite author", "idMessage": 3, "messageComment": "I agree",
"authorRequestId":123,"pageRequestId":456,"messageRequestId":789 }'
Check your PostgresSQL
database on port 5432
and database eventuate and check the eventuate schema and the public schemas for changes in the tables. Namely, we want to look at tables message, saga_instance and received_messages in the eventuate schema and all the comment tables in the public schema. Let’s try different Id
combinations and see what happens.
Just to provide an example I will now send a request, which I know it will make the whole Saga
fail:
curl -X POST http://localhost:8082/api/saga/orchestration -H 'Content-Type: application/json' --data '{ "idPage": 1, "pageComment": "I love this", "idAuthor": 2, "authorComment": "This is my favourite author", "idMessage": 999999, "messageComment": "I agree", "authorRequestId":200,"pageRequestId":500,"messageRequestId":800 }'
The reason for this is that I’m sending a message with id 999999
. With the current id generation system, it would take an extended amount of time to get to this id, and so I’m very sure that we have no message with this Id.
For Orchestration
we’ll get:
And finally, we can see what happens to the comments tables when the flow is correct and when it’s not:
We can also do the same for the choreography tables. The results are very similar, so I leave that for you to try.
8. Conclusion
As we have seen in this example, Sagas
are a great way to manage transactions. They work in a decoupled fashion, and they follow A.C.I.D.
principles in their sub-processes. They provide a solution for LLT latency and degrading performance.
We have seen advantages in both situations:
For Choreography
, we see that there is no workflow, this, in turn, may result in a reduced overhead in performance. There is also no need for an extra framework to support it. It is an event-driven form of implementation which allows for a known way to implement loose coupling between the different system elements.
For Orchestration
, we see that we can prevent process complexity given that it is command-driven and not event-driven. This means in practical terms, that orchestration will follow a built-in error-handling workflow. It gives better visibility of what is being done. Because it forces us to follow a particular standard, which is already tested and proven, it also prevents too many custom variations in the code which are usually error-prone.
I have placed all the source code of this application in GitHub.
I hope that you have enjoyed this article as much as I enjoyed writing it. I tried to keep it small, and concise and I left many small details out.
I’d love to hear your thoughts on it, so please leave your comments below.
Thanks in advance for your help, and thank you for reading!
References
- Eventuate.IO
- Saga: How to implement complex business transactions without two phase commit
- Managing data consistency in a microservice architecture using Sagas - Implementing an orchestration-based saga
- Managing data consistency in a microservice architecture using Sagas - Implementing a choreography-based saga
- Choreography pattern with Springboot
- Spock Framework Reference Documentation
- Interaction Based Testing with Spock
- Bash tips: Colors and formatting (ANSI/VT100 Control sequences)
- JUnit 5 Parameter Resolution Example
Top comments (1)
The titles of these sources are full of errors: Saga can't manage consistency and microservices are not architecture.