Discovering the need for stream metadata.
One of the best uses of Event Sourcing is to manage behaviors between entities, aka relationships. This is where all interesting things happen. For example, a product catalog is just a collection of data. It doesn't do anyone any good until those products are sold, shipped, restocked, etc. All of these processes occur between people or companies. They occur in relationships.
Relationships with Event Sourcing
These relationships need to be reflected somewhere in the event stream. In our domain of safety training, the main activity we track is training. Which involves a student and a course. We use a Training ID (a UUID) to identify each training process. If we were creating relational tables, we would include foreign keys to Student ID and Course ID to track the entities involved. How do we express this in an event store?
In starter event
In our first effort, we simply included reference IDs (Student ID and Course ID) in the starter event of our training process stream. For example, the starter event for a training process might be RegistrationCreated
. Storing the reference IDs in the starter event supports executing workflows and running queries. (The command and query APIs in CQRS parlance.) Both of these use a "replay" process that can keep the reference IDs from the first event for later use. This worked out great for our first event sourced application. But then we built an app for a more complex domain.
In key events
We rearchitected one of our internal apps for the cloud. This time the domain is auditing. More specifically, verifying a vendor (contractor, supplier, etc) meets requirements before they are allowed to do work for a company.
We started out placing the reference IDs only in the starter event. But this strategy was problematic for some kinds of event listeners -- like process managers -- and for building more complicated data models. In these cases, the "starting" event for the listener / data model is different from the starting event for the stream. We ended up solving this problem by just repeating the same reference IDs on these key events. The minor downside is storing duplicate data.
As an alternative, we considered loading the reference IDs from an existing data model. But we decided this was a net loss. It would have created multiple degrees of coupling (server, code) between subsystems. And add complication to the infrastructure that would benefit relatively few cases.
Stream metadata
We ran into this again recently. Only this time I realized that these reference IDs (so far) are always static for a given stream. So they should be attached to the stream itself rather than specific events. They should be stream metadata. When the stream is "started", reference IDs are saved in stream metadata instead of on the initial event. When an event is loaded from the event store, the stream metadata can optionally be loaded too. So the reference IDs are available to listeners, regardless of which events they care about. No more repeating the same IDs in multiple events.
Prior art
I had seen this concept of stream metadata in my research on Event Store (Greg Young's open source product). But at the time I wasn't sure why I needed it. I have been burned before because I used tools incorrectly when I didn't understand them. So I did not include it in my Postgres event store implementation. In many ways I feel like I am just rediscovering things that Greg Young figured out years ago. Then documenting them in a way that my past self might understand. 😄
Top comments (0)