Nested validation in .NET

#dotnet #csharp #dotnetcore

In this blog's opening post, I discuss the problem of validating nested Data Transfer Objects in modern .NET. Nesting simply means that the root object can reference other DTOs, which in turn can reference others and so on, potentially forming a cyclic graph of unknown size. For each node in the graph, its data properties are validated against a quite typical rule set: nullability, range, length, regular expressions etc.
And for DTO types, let's declare the following conventions:

It may have DataAnnotation attributes, including custom ones.
It may implement IValidatableObject.
It should avoid third-party dependencies if possible.

You may have guessed that the graph is the tricky part. Indeed, a built-in DataAnnotations.Validator doesn't do nested validation by design, and this was a default behaviour for decades. But the fix is trivial, right? Just implement any kind of graph traversal with cycle detection! Well, yes and no. In this post, I compare popular third-party libraries that support nested validation. Looking ahead, there is a big performance difference even among robust production-ready solutions.

There are many ways to define validation rules in .NET, each with its own advantages and disadvantages. For example:

Attributes: explicit, useful for OpenAPI document generation.
IValidatableObject: more flexible yet still self-contained.
External: This is a jack of all trades. It leaves DTOs clean and provides maximum flexibility (FluentValidation is the best example of this approach).
Manual validation: the most naive approach, it simply has inlined if clauses without declaring validation rules at all. As a result, it gives unbeatable performance at the cost of scalability, and it doesn't apply to a graph of unknown length/topology. Later it is used as a benchmark baseline.

To finish this long intro and save everyone's time, let me highlight what is not covered in this article:

ASP.NET Model Validation. Although it comes with full support for DataAnnotations attributes, it is still an inseparable part of a large and complex framework that deals with both server-side application and Web APIs, ModelState, version backward comparability, etc... a topic that undoubtedly deserves its own article.
IOptions<T> validation. Ironically, with the arrival of [ValidateObjectMembers] and [ValidateEnumeratedItems] in .NET 8, OptionsBuilder<TOptions> now supports validation of nested options. And there are now at least 3 different validation algorithms shipped with ASP.NET.

What is validation?

Let's say we're processing a user's registration email address. What should we check?

The address should be in the correct format. This is validation.
The address domain should not be on our blacklist. This is a business rule.
The address should be unique in our database. This is a business rule.

What is the difference? Validation is a pure function. It is deterministic (same input - same output) and has no side effects. That's why looking for a domain in a list is not validation: such lists are subject to change, so they're not deterministic. A good rule of thumb for mere enterprise developers like me:

Validation: self-contained (we only need the data from the DTO itself)
Business rule: anything that touches mutable data (database, API, file system etc.)

And my advice is: don't mix them up. Validate your input before the control flow even reaches your business domain. Just like ASP.NET does with model binding. Regardless of the application architecture, in many cases you actually want fail fast on invalid/malicious input and avoid unnecessary allocation of your scoped and transient services. Then, testing: covering pure functions with tests is trivial. Well, at least it is way easier to do separately, than mocking a database and couple of APIs for all-at-once validator. Put some effort into the quality of the data coming into your domain, and you'll get a clearer and more concise domain logic.

To go deeper, please read Mark Seemann's Validation and business rules post, discussing the topic in great detail. Let me say a few things about the libraries under consideration, and we can finally get on with the benchmarking.

DataAnnotationsValidator

Our first contender is the DataAnnotationsValidator.NETCore package. It is long dead and has performance issues, so strongly not recommended. However, this library illustrates well the idea behind many home-made solutions:

Reflection to read metadata.
Recursive depth-first search for traversing a graph.
A hash set for cycle detection.

MiniValidation

Alive and well-designed, MiniValidation offers smooth experience in nested validation. While implementing a similar depth-first search for visiting a DTO graph, it adds metadata caching to the mix, resulting in much better performance.

FluentValidation

FluentValidation is undoubtedly the most popular third-party validation library on .NET. It is a robust choice if you need clean POCOs or multiple validation maps per type. However, its performance may surprise you.

Benchmark: DataAnnotation and FluentValidation

Our first benchmark is to validate a fairly typical DataAnnotation-marked DTO, containing both a single nested object and a collection of them (each is expected to be validated):

public class Parent
{
    [Range(1, 9999)]
    public int Id { get; set; }

    [Required(AllowEmptyStrings = false)]
    [StringLength(12, MinimumLength = 12)]
    public string? Name { get; set; }

    [Required]
    public Child? Child { get; set; }

    [Required]
    public List<Child> Children { get; init; } = new(0);
}

public class Child : IChild
{
    [Required]
    public DateTime? ChildCreatedAt { get; set; }

    [AllowedValues(true)]
    public bool ChildFlag { get; set; }
}

Of course, FluentValidation has no use for these attributes, so its validators are created separately while repeating the same rules:

public class ParentValidator : AbstractValidator<Parent>
{
    public ParentValidator()
    {
        RuleFor(x => x.Id).InclusiveBetween(1, 9999);
        RuleFor(x => x.Name).NotEmpty().Length(min: 12, max: 12);
        RuleFor(x => x.Child).NotNull().SetValidator(new ChildValidator());
        RuleForEach(x => x.Children).NotNull().SetValidator(new ChildValidator());
    }
}
public class ChildValidator : AbstractValidator<Child>
{
    public ChildValidator()
    {
        RuleFor(x => x.ChildCreatedAt).NotNull();
        RuleFor(x => x.ChildFlag).Equal(true);
    }
}

Finally, the Manual benchmark uses explicit if checks and serves as a baseline. Each benchmark is runned against of the same Parent collection. There are the results depending on the collection size:

Method	Size	Mean	Allocated	Alloc Ratio
Manual	100	3 μs	34 KB	1
MiniValidation	100	162 μs	427 KB	12
DataAnnotationsValidator	100	302 μs	614 KB	17
FluentValidation	100	314 μs	946 KB	27
Manual	1000	33 μs	343 KB	1
MiniValidation	1000	1586 μs	4260 KB	12
DataAnnotationsValidator	1000	3084 μs	6150 KB	17
FluentValidation	1000	3300 μs	9586 KB	27
Manual	10000	342 μs	3437 KB	1
MiniValidation	10000	16237 μs	42619 KB	12
DataAnnotationsValidator	10000	31223 μs	61480 KB	17
FluentValidation	10000	32364 μs	95911 KB	27

Well, DataAnnotationsValidator is expectedly bad, but FluentValidation... is even worse in both time and space! At first I thought there was a bug (there was not). Then I did my best to look for FluentValidation settings that might help to optimise its performance (there weren't any, except "fail fast", see below). The overall result distribution remains the same.
But look at MiniValidation! The same algorithm, but optimised for performance, gives a quite impressive 2x boost over DataAnnotationsValidator.

Benchmark: IValidatableObject

As you probably know, IValidatableObject is an alternative to explicit DataAnnotations attributes, with all the validation logic encapsulated within DTOs. This benchmark uses the same validation rules but implemented in Validate method, so it's all about traversing a graph and calling Validate at each node. FluentValidation is not on the list this time.

public class ChildValidatableObject : IValidatableObject
{
    public DateTime? ChildCreatedAt { get; set; }
    public bool ChildFlag { get; set; }

    public IEnumerable<ValidationResult> Validate(ValidationContext validationContext)
    {
        if (ChildCreatedAt == null)
        {
            yield return new ValidationResult("foo error message #2", 
                new[] { nameof(ChildCreatedAt) });
        }

        if (ChildFlag == false)
        {
            yield return new ValidationResult("foo error message #3", 
                new[] { nameof(ChildFlag) });
        }
    }
}

Method	Size	Mean	Allocated	Alloc Ratio
'Manual with IVO.Validate call'	100	21 μs	109 KB	1.00
'MiniValidation + IVO'	100	59 μs	199 KB	1.82
'DataAnnotationsValidator + IVO'	100	151 μs	442 KB	4.04
'Manual with IVO.Validate call'	1000	206 μs	1093 KB	1.00
'MiniValidation + IVO'	1000	565 μs	1992 KB	1.82
'DataAnnotationsValidator + IVO'	1000	1511 μs	4421 KB	4.04
'Manual with IVO.Validate call'	10000	2141 μs	10937 KB	1.00
'MiniValidation + IVO'	10000	6608 μs	19921 KB	1.82
'DataAnnotationsValidator + IVO'	10000	16254 μs	44219 KB	4.04

Again, MiniValidation wins by an even larger margin. Now let's merge the results and look at the overall performance (values rounded for readability):

Method	Size	Mean	Allocated
Manual	10000	342 μs	3437 KB
Manual with IVO.Validate call	10000	2141 μs	10937 KB
MiniValidation + IVO	10000	6608 μs	19921 KB
MiniValidation	10000	16237 μs	42619 KB
DataAnnotationsValidator + IVO	10000	16254 μs	44219 KB
DataAnnotationsValidator	10000	31223 μs	61480 KB
FluentValidation	10000	32364 μs	95912 KB

You may notice that MiniValidation + IValidatableObject give the best results of all third party libraries.

Benchmark: Fail fast

And yet FluentValidation has the feature that other competitors lack: CascadeMode.Stop. It's flexible and can be set at different levels (rule, class, global):

public class FailfastChildValidator : AbstractValidator<Child>
{
    public FailfastChildValidator()
    {
        ClassLevelCascadeMode = CascadeMode.Stop;
        //All the rules are declared as usual
        //...
    }
}

Method	Size	Mean	Allocated
FluentValidation + Fail Fast	10000	9012 μs	38556 KB
FluentValidation	10000	32364 μs	95911 KB

Of course, the fail-fast version is much faster. Most of the time I prefer the full validation report, but fail-fast is an option worth mentioning when talking about performance.

Summary

In this post I discussed the problem of validating nested objects in .NET. Since the built-in DataAnnotations validator doesn't traverse complex properties, we have to rely on third-party libraries for this. I explained the difference between validation and business rules, and why this is important.

As for the benchmark results:

The MiniValidation library shows the best overall performance.
FluentValidation, despite its popularity, is generally 2x slower. There are some faster alternatives, such as Validot, but I would like to leave the burden of benchmarking to its maintainers.

And don't get me wrong. If you want to decouple your rules from DTOs and get a simple, stable and production-tested solution - just take FluentValidation, because its performance difference is negligible in many cases. If you need self-describing DTOs - stick with MiniValidation. And for performance driven code - inline your checks where possible.

The obvious next step in the development of general purpose validation libraries, is, of course, the adoption of ~~ChatGPT~~ source generators. A validation generator such as this one would potentially eliminate the performance gap between general usage validation libraries and inlined validation. In fact, we already have all the necessary technology shipped with .NET, so stay tuned for news!

All the code from the article is available on Github: https://github.com/ilya-chumakov/PaperSource.DtoGraphValidation.