DEV Community

Cover image for Data Constraints: From Imperative to Declarative
Kevin Burns
Kevin Burns

Posted on • Edited on

Data Constraints: From Imperative to Declarative

There once was a client who wanted to port a Lambda from Typescript to Go. This was a Lambda with a capital L. Thousands of lines of code spread across dozens of files, many of them automatically generated database models copy/pasted from other repositories. The initiative was sound. To consolidate the business logic into a single application and in so doing unify a number of often inconsistent data structure definitions.

You see, the database was MongoDB which meant that the data model was whatever the developer decided the data model was at the point at which any given document was inserted into the database by any particular application. Some call this a “schemaless” database but I prefer the term “schema-on-read”.

Although MongoDB supports json schema validation as a functional equivalent to a database schema, this project used none. This is not surprising given that MongoDB’s marketing strategy consists primarily of incessantly trumpeting the advantages of unstructured data to unwitting javascript developers in the most obnoxious way possible.

Relational algebra is better than drugs

I successfully avoided MongoDB for my entire career until one day came a client with an interesting set of problems. Despite my aversion to MongoDB on principle, the strongest advice I could give this client would be to not have used it in the first place which is sometimes not an option given the unreliable state of present day time travel technology.

In order to provide meaningful advice to clients on how to use MongoDB well and prevent existing implementations from degrading, I would need an expert knowledge of MongoDB and that could only come from falling neck deep for months into a real world system with truckloads of dynamic unstructured data feeding the needs of real world customers.

I accepted the challenge and took on many tasks solving complex, elusive problems often leading to the discovery of undefined behavior. We will focus on one such task:

The lambda migration

This particular lambda wasn’t ready yet. The lambda was known to have bugs and edge cases and the company’s leadership (CEO, CPO, CTO) would need to look at it from every angle to ensure that the business logic was consistent with their expectations before it could be ported to Go. Leadership worked together intensively for a week until they produced a version of the lambda that they felt met the needs of their customers.

As a contractor, it was my job to port the logic from Typescript to Go. I examined the logic. Studied it judiciously. Then I began porting it to Go in such a way that I could run an integration test suite against both my new version in Go and the existing lambda in Typescript to ensure that the output was identical in all conceivable scenarios and produced identical effects on the database.

Once I was satisfied that the battery of tests proved that my implementation matched the intended logic completely, I developed a growing suspicion that the internally consistent and complete nature of the most critical area of the logic might indicate that it could be reduced to a set of unique constraints. Raising the concern with the client, I received the following response verbatim which I will treasure with all my heart until the day that I die:

“Our business logic is too complex to be expressed as database constraints.”

RIP Isaac Hayes

Naturally, I took this as a challenge. After a few hours of learning how indexes work in MongoDB, I produced a pair of compound unique indices (one sparse and one full) that would impose the exact same constraints enforced by the business logic on all new and existing data. Extending my existing test suite to support a third side-by-side implementation quickly proved that the pair of indices did in fact produce identical behavior to the 50 or so lines of business logic under all conceivable scenarios.

Leadership was stunned. I had successfully transformed a piece of critical business logic from imperative logic to declarative data constraints. Management no longer had to reference a spreadsheet with dozens of combinatorial expected outcomes. A description of the expected behavior could be reduced to two statements in plain English.

However, collections in MongoDB are not the tidy containers of data you expect coming from a decade of experience with schema-on-write databases like MySQL and Postgres. Schemaless collections in MongoDB are more like that bag of zucchini you left in the refrigerator when you went on vacation. They were fine when you bought them, but upon return they’ve changed color and are beginning to leak.

When the migration was run to add the unique indexes to the collection, it failed. The reason it failed is because there existed data in the collection which didn’t match the constraints. This data was inserted into the database at a time when the logic for the lambda was different. The errant data would need to be altered or deleted in order for the constraints to be applied.

Analysis of this data showed that it drifted from the constraints in a number of different ways. These hundreds of records were reduced to a handful of ways in which they departed from the expected schema. Laid on a spreadsheet was each type of data problem, it’s first occurrence, its last occurrence, the number of occurrences, and a column to track what ought to be done about it (DELETE, FIX, or TBD) along with relevant notes.

Most of the errant rows could be discarded as complete trash with no discernable value. In some cases the problems could be fixed with a migration. However, there also existed a class of problems where the data was incorrect in a way that could be corrected, except that to do so would produce unexpected behavior for end users. There were enough instances of data problems that would adversely affect customers if fixed that it presented a significant challenge to the business. How would they communicate to their customers the change in behavior of their system? Would they need to issue refunds to their customers? Would their customers need to issue refunds to their affected customers?

Had the imperative logic not been converted to declarative data constraints, these cases would have gone unnoticed. Issues like this are important because left unaddressed they often lead to an engineering culture in which it is generally understood (and too often accepted) that older customers are more likely to encounter defects than newer customers which can damage customer retention.

This pattern is interesting as it reflects the history of MongoDB itself when users left in droves after the industry learned of its misleading durability and consistency guarantees used to achieve its impressive benchmarketing results. MongoDB has come a long way since then and it is now a lot better than it was at not losing data. So if there is hope for MongoDB, I believe there is hope for humanity.

Conclusion

Modeling business logic as declarative constraints from the outset can save loads of time and significantly improve the quality of your product. If you can push the line between imperative logic and declarative data constraints you are likely to impress your peers by discovering unknown unknowns.

Top comments (0)