Background
At convoy we started off using MongoDB as our primary data store and it was super great for a long time. However, as data complexities grew we quickly realised a relational database was superior for our use case. We detail our reasons for switching to PostgreSQL here. But there was just one minor (well not so minor) problem — we relied on MongoDB’s native queries heavily in the subscriptions filter engine and switching meant we needed to implement this ourselves in Go, as there was no readily available library to support this.
The filter engine determines if an event payload matches a given filter[1], if it matches, the event is routed to the subscription’s endpoint, otherwise the event will get dropped.
The filter engine determines if an event payload matches a given filter in two steps:
Flatten the event payload.
Compare the flattened payload to the pre-flattened[2] filter.
Going forward in this article, we’ll just focus on flatten library itself.
Let’s Dig In
The flatten lib uses Depth first search to find all entries in the json map and concatenate their keys separated by the dot notation in the result map. The only catch here is the library “steps over” any operators like $and, $in, $exist etc.
For example:
{
"$and": [
{
"$or": [
{
"person": {
"age": {
"$in": [10, 11, 12]
}
}
},
{
"places": {
"temperatures": 39.9
}
}
]
},
{
"city": "london"
}
]
}
Becomes:
{
"$and": [
{
"$or": [
{
"person.age": {
"$in": [10, 11, 12]
}
},
{
"places.temperatures": 39.9
}
]
},
{
"city": "london"
}
]
}
Analysing Initial Implementation
Flatten initially used the recursive version of DFS, meaning for every Map or array entry in the map, the function would call itself, see the full implementation here. While this approach works fine for small payloads, it leads to high memory and CPU consumption when dealing with larger payloads. This becomes particularly problematic when processing event payloads with multiple filters, as seen in BenchmarkFlattenLargeJson with a 17KB GitHub pull request webhook payload:
The approach was riddled with several compounding problems:
Heap memory allocation: The recursive calls create a new instance of the method, consuming stack space, though stack space itself is “cheap”, the function go through another round of heap memory allocations inside the function.
New key string concatenation: New map entry keys were being created with:
fmt.Sprintf("%s.%s", prefix, newKey)
While this makes sense for small strings, repeatedly doing this for potentially large strings can be disastrous for memory management. fmt.Sprintf has to do a lot of checks and casting to figure out where to place the given values in the format string, this consumes a lot of CPU time & memory.
Result Map Copies: When the recursion terminates and the stack starts to unwind, every item in the result map returned by a latter call will be copied into the result map of the former call.
Result Map resizing: Another problem not immediately obvious is map capacity when copying the items during unwinding the stack as depicted above. Because the result maps are not pre-allocated with enough space to hold all expected keys, with a sufficiently large input, the result map will resized by the runtime multiple times. This is an expensive operation managed by runtime.growWork_faststr & runtime.evacuate_faststr[3].
How about a for loop instead?
To address the inefficiency of the recursion, we opted to use a custom stack (stackFrame) to keep track of map entries iteratively.
In this approach, we keep a slice of stackFrames, this will serve as our self managed call stack. You can find the full implementation here.
func flatten(prefix string, nested interface{}) (M, error) {
var stack []stackFrame
stack = append(stack, stackFrame{prefix, nested})
}
This implementation removes the repeated heap memory allocations caused by recusrsion. A strings.Builder is now used to generate new result map keys efficiently. With a for loop, we can also avoid result map copy problem caused by stack unwinding, there is only one result map and all entries get stored there directly.
However the problem of map resizing still exists. Is it possible to make a pass over the input and determine how many entries will be required in the resukt map? Actually yes. For a sufficiently large input, it is cheaper to do this to avoid resizing operations. The countKeys function does this. With this we allocate a large enough result map once.
After the new implementation was profiled, we saw a nearly 70% performance improvement across CPU time & memory allocations. Here’s a side by side comparison:
Conclusion
Our initial implementation using the recursive version of DFS to flatten payloads was good enough for a small number filters to be processed. But at scale, it was too slow and costly causing a huge bottleneck for our customers. We revised the implementation and decided to get rid of the recursion in favour of a custom stack iterative approach and saw huge performance gains.
[1] — Please note; both event payloads & filters are json data marshalled into map[string]interface{}.
[2] — The filter is flattened at point of creation. This avoids the costly operation of flattening it for every event.
[3] — This operation is opaque to golang applications, it is managed entirely by the runtime.
P.S: sorry about the blurry screenshots, not sure what's wrong with dev.to image uploads, you can find clearer images in the medium post.
Top comments (0)