Dmitry Zakharov

Posted on Feb 21

JavaScript schema library from the Future 🧬

#schema #typescript #rescript #opensource

ReScript Schema - The fastest parser in the entire JavaScript ecosystem with a focus on small bundle size and top-notch DX.

Why did you not hear about it then, and why should you learn about it now? I started developing the library three years ago, and today, it's at a point others have yet to achieve. I'll prove it in the article, but before we start, I'd like to answer a few questions you might already have.

What's a parser?

One of the most basic applications of ReScript Schema is parsing - Accepting unknown JavaScript data, validating it, and returning the result of your desired type. There are dozens of such libraries, and the most popular ones are Zod, Valibot, Runtypes, Arktype, Typia, Superstruct, Effect Schema, and more. Also, even though this is slightly different, validation libraries like Ajv, Yup, and others also stand really close.

Is ReScript Schema faster than all of them?

Yes. It's ~100 times faster than Zod and on par with Typia or Arktype (benchmark). But often, besides validation, you want to transform data incoming to your system, and here, ReScript Schema overperforms any solution existing in the JavaScript ecosystem.

What's ReScript? Isn't the library for JavaScript/TypeScript?

ReScript is a robustly typed language that compiles to efficient and human-readable JavaScript. And yes, ReScript Schema is written in ReScript, but it also has a really nice JavaScript API with TS types. You don't need to install or run any compiler; npm i rescript-schema is all you need.
It makes ReScript Schema support 3 languages - JavaScript, TypeScript, and ReScript. This is especially nice when you mix TypeScript and ReScript in a single codebase 👌

Are there trade-offs?

Yes. To maximize DX, performance, and bundle size while keeping the library fully runtime, I've decided to use eval under the hood. You needn't worry about the code's dangerous execution, but some environments, like Cloudflare Workers, won't work. In 99% of cases, you don't need to be concerned about this. I just think it's my duty as a creator to let you know about the 1% beforehand.

What's the plan?

I'm going to provide an overview of the basic ReScript Schema API and mental model. Then, we'll discuss what makes it stand out from millions of similar libraries (and this is not only performance). I'll also look at some more advanced use cases and discuss the ecosystem, performance, and where it stands with other libraries.

I hope you'll enjoy it. 😊

Follow me on X to learn more about programming stuff I'm cooking.

Parsing / Validating

Let's start with the most basic use case of ReScript Schema. By the way, if you don't know the difference between parsing (sometimes called decoding) and validating, here's a good article from Zod's docs. If you're curious about when and why you need to parse data in your application, let me know in the comments. I can write a big article about it, but for now, I assume you are already familiar with the concept.

Let's finally take a look at the code. I'll go with the TypeScript example first, so it's more familiar for most readers. Everything starts with defining a schema of the data you expect:

import * as S from "rescript-schema";

const filmSchema = S.schema({
  id: S.number,
  title: S.string,
  tags: S.array(S.string),
  rating: S.union(["G", "PG", "PG13", "R"])
})

The schema here is like a type definition that exists in runtime. If you hover over the filmSchema, you'll see the following type:

S.Schema<{
 id: number;
 title: string;
 tags: string[];
 rating: "G" | "PG" | "PG13" | "R";
}>

This is a Schema type that inferred the film object definition. I recommend extracting its value into its own type. This way, you'll have the schema as the source of truth and the Film type always matching the schema:

type Film = S.Output<typeof filmSchema>

After we've defined our Film type using the schema, we can parse unknown data entering our application to guarantee that it matches what we expect:

S.parseOrThrow(
  {
    id: 1,
    title: "My first film",
    tags: ["Loved"],
    rating: "S",
  },
  filmSchema,
);
//? Throws RescriptSchemaError with message `Failed parsing at ["rating"]. Reason: Expected "G" | "PG" | "PG13" | "R", received "S"`

S.parseOrThrow(validDataWithUnknownType, filmSchema)
//? Returns value of the Film type


// If you don't want to throw, you can wrap the operations in S.safe and get S.Result as a return value
S.safe(() => S.parseOrThrow(data, filmSchema))

Done! We have valid data here 🙌

Some experienced users may have noticed that the API is similar to Valibot, but with a unique flavor.

You can use S.schema for objects, tuples, and literals. For any kind of union, there's S.union; even if it's a discriminated one, the parser will perform in the most optimized way. I personally have seen this kind of DX only in ArkType so far.

Also, there are no annoying parentheses; the parse function explicitly says it can throw, and thanks to the modular design, the library tree-shaking is very good.

Package size

Since I mentioned tree-shaking, I'd like to quickly note about the package size. The bundle size is an essential metric for a web application, and I'd like to share how ReScript Schema is doing here in comparison with other libraries:

	rescript-schema@9.2.2	Zod@3.24.1	Valibot@1.0.0-beta.14	ArkType@2.0.4
Total size (minified + gzipped)	12.7 kB	15.2 kB	12.3 kB	40.8 kB
Example size (minified + gzipped)	5.14 kB	14.5 kB	1.39 kB	40.7 kB
Playground	Link	Link	Link	Link

It's not as amazing as Valibot, but ReScript Schema is definitely doing good here. If we compare ReScript Schema to libraries that have similar performance, they all use the code generation approach (besides ArkType). This means it'll start small, but for every new type, more and more code will be added to your bundle, rapidly increasing the application size.

Parsing using ReScript

Even though I want to make ReScript Schema popular for TS developers, ReScript is still the library's main user base, so I'll also include examples of it.

Compared to TypeScript, the type system in ReScript is much simpler; you literally can't do any type gymnastics in it. Together with nominal typing, it's getting impossible to extract the film type from the schema (even though it can infer it). But there's a built-in way to prevent boilerplate code in ReScript. You can use ReScript Schema PPX to generate schemas for your types automatically. Just annotate them with @schema attribute.

@schema
type rating =
 | @as("G") GeneralAudiences
 | @as("PG") ParentalGuidanceSuggested
 | @as("PG13") ParentalStronglyCautioned
 | @as("R") Restricted
@schema
type film = {
 id: float,
 title: string,
 tags: array<string>,
 rating: rating,
}

Does the rating type look scary to you? Don't worry, this is a ReScript Variant, which is such a nice way to describe any kind of union. Also, you can use @as and give a better name to the ratings while preserving the original short values in runtime.

Although PPX is nice, you can always code without it:

type rating =
 | @as("G") GeneralAudiences
 | @as("PG") ParentalGuidanceSuggested
 | @as("PG13") ParentalStronglyCautioned
 | @as("R") Restricted
type film = {
 id: float,
 title: string,
 tags: array<string>,
 rating: rating,
}

let filmSchema = S.schema(s => {
  id: s.matches(S.number),
  title: s.matches(S.string),
  tags: s.matches(S.array(S.string)),
  rating: s.matches(S.union([
    GeneralAudiences,
    ParentalGuidanceSuggested,
    ParentalStronglyCautioned,
    Restricted
  ]))
})

The TS API admittedly wins here since we don't need to call s.matches to make type system happy, but when it comes to parsing ReScript takes it back with the Pipe Operator and Pattern Matching on exceptions:

{
  "id": 1,
  "title": "My first film",
  "tags": ["Loved"],
  "rating": "S",
}->S.parseOrThrow(filmSchema)
//? Throws RescriptSchemaError with message `Failed parsing at ["rating"]. Reason: Expected "G" | "PG" | "PG13" | "R", received "S"`

validDataWithUnknownType->S.parseOrThrow(filmSchema)
//? Returns value of the film type

// If you don't want to throw, you can match on the S.Raised exception and return the result type. There's no S.safe API like in TypeScript, since you can do better with the language itself!
switch data->S.parseOrThrow(filmSchema) {
| film => Ok(film)
| exception S.Raised(error) => Error(error)
}

Unique Features

After we covered the most basic use case, let's move on to the things that make ReScript Schema special 🔥

Changing shape and field names

Let's imagine working with a weird REST API with poorly named fields in PascalCase, where data is randomly nested in objects or tuples. But we can't change the backend, so at least we want to transform data to a more convenient format for our application. In ReScript Schema you can make it in a declarative way, which will result in the most possibly performant operation:

const filmSchema = S.object((s) => ({
  id: s.field("Id", S.number),
  title: s.nested("Meta").field("Title", S.string),
  tags: s.field("Tags_v2", S.array(S.string)),
  rating: s.field("Rating", S.schema([S.union(["G", "PG", "PG13", "R"])]))[0],
}));

S.parseOrThrow(
  {
    Id: 1,
    Meta: {
      Title: "My first film",
    },
    Tags_v2: ["Loved"],
    Rating: ["G"],
  },
  filmSchema
);
//? { id: 1, title: "My first film", tags: ["Loved"], rating: "G" }

Looks scary? Let's dive in. First of all, every schema has Input and Output. Quite often, they are equal, and during parsing, the library only validates that Input has the correct type and returns it immediately. Although there are ways to change the expected Output type like we do in the example above. For comparison, let's take a look at how you'd usually achieve the same with other schema libraries:

const filmSchema = S.transform(
  S.schema({
    Id: S.number,
    Meta: {
      Title: S.string,
    },
    Tags_v2: S.array(S.string),
    Rating: S.schema([S.union(["G", "PG", "PG13", "R"])]),
 }),
 (input) => ({
    id: input.Id,
    title: input.Meta.Title,
    tags: input.Tags_v2,
    rating: input.Rating[0],
 })
);

This is still ReScript Schema, but we use S.transform to manually transform the Input type. You can find this kind of API in many other schema libraries. What's good about the example is that you can clearly see that we use our schema to declaratively describe what the data incoming to our system looks like, and then we transform it to what's convenient for us to work with. In a way, the schema here is similar to a contract between the client and the server that returns the object in response.

In the advanced S.object example, which I showed first, we combine a declarative description of the Input type with a transformation to the Output type. And this enables one more thing besides shorter code and a performance boost.

Reverse Parsing (aka serializing/decoding)

Decoding is present in many libraries from other languages, but it's not very common in the JS ecosystem. This is a big loss because the ability to perform operations in the reverse direction is the most powerful feature I personally find.

If it's unclear what I mean, in other popular JavaScript schema libraries, you can only parse Input to Output types. While in ReScript Schema you can easily parse Output to Input using the same schema. Or only perform the conversion logic since the Output type usually doesn't require validation.

Do you remember our filmSchema using S.object to rename fields? Let's say we want to send a POST request with the film entity, and the server also expects the weirdly cased data structure it initially sent to us. Here is how we deal with it:

// The same schema from above
const filmSchema = S.object((s) => ({
 id: s.field("Id", S.number),
 title: s.nested("Meta").field("Title", S.string),
 tags: s.field("Tags_v2", S.array(S.string)),
 rating: s.field("Rating", S.schema([S.union(["G", "PG", "PG13", "R"])]))[0],
}));

S.reverseConvertOrThrow({ id: 1, title: "My first film", tags: ["Loved"], rating: "G" }, filmSchema)
//? { Id: 1, Meta: { Title: "My first film" }, Tags_v2: ["Loved"], Rating: ["G"] }

Sweet! Isn't it? And even though I want to talk more about performance a little bit later, I can't stop myself from sharing the code it evaluates under the hood:

(i) => {
  let v0 = i["tags"];
  return {
    Id: i["id"],
    Meta: { Title: i["title"] },
    Tags_v2: v0,
    Rating: [i["rating"]],
 };
};

I think most people would write slower code by hand 😅

Reverse

The S.reverseConvertOrThrow is one of the reverse cases I use daily in my work, but this is actually just a shorthand of S.convertOrThrow and S.reverse you can use separately.

S.reverse - this is what allows you to take your Schema<Input, Output> and turn it into Schema<Output, Input>.

It may sound quite dull, but compared to the commonly used parser/serializer or encoder/decoder approach, here you get an actual schema you can use the same way as the original one without any limitations.

If you want, you can parse output with/without data validation, generate JSON Schema, perform optimized comparison and hashing, or use the data representation in runtime for any custom logic.

As a fruit of the ability to know Input and Output data types in runtime, ReScript Schema has a very powerful coercion API.

const schema = S.coerce(S.string, S.bigint)
S.parseOrThrow("123", schema) //? 123n
S.reverseConvertOrThrow(123n, schema) //? "123"

Pass any schemas to S.coerce that you want to coerce from and to, and ReScript Schema will figure out the rest.

And this has not been implemented yet, but with the API, it'll also be possible to achieve 2x faster JSON.stringify(). Like fast-json-stringify does and maybe even faster 😎

100 Operations

If you want the best possible performance or the built-in operations don't cover your specific use case, you can use S.compile to create fine-tuned operation functions.

const operation = S.compile(S.string, "Any", "Assert", "Async");
//? (input: unknown) => Promise<void>

await operation("Hello world!");

In the example above, we've created an async assert operation, which is not available by default.

With the API, you can get 100 different operation combinations, each of which might make sense for your specific use case. This is like parser in Valibot, but multiplied by 💯.

Performance Comparison

As I mentioned in the beginning, ReScript Schema is the fastest. Now I'll explain why 🔥

Also, you can use the big community benchmark to confirm yourself. If you see Typia overperforming ReScript Schema, I have a take on it too 😁

First of all, the biggest advantage of ReScript Schema is its very clever library core, which builds the most possibly optimized operations using eval. I have already shown before how the operation code looks for reverse conversion; here's the filmSchema parse operation code:

(i) => {
  if (typeof i !== "object" || !i) {
    e[7](I);
  }
  let v0 = i["Id"],
    v1 = i["Meta"],
    v3 = i["Tags_v2"],
    v7 = i["Rating"];
  if (typeof v0 !== "number" || Number.isNaN(v0)) {
    e[0](v0);
  }
  if (typeof v1 !== "object" || !v1) {
    e[1](v1);
  }
  let v2 = v1["Title"];
  if (typeof v2 !== "string") {
    e[2](v2);
  }
  if (!Array.isArray(v3)) {
    e[3](v3);
  }
  for (let v4 = 0; v4 < v3.length; ++v4) {
    let v6 = v3[v4];
    try {
      if (typeof v6 !== "string") {
        e[4](v6);
      }
    } catch (v5) {
      if (v5 && v5.s === s) {
        v5.path = '["Tags_v2"]' + '["' + v4 + '"]' + v5.path;
      }
      throw v5;
    }
  }
  if (!Array.isArray(v7) || v7.length !== 1) {
    e[5](v7);
  }
  let v8 = v7["0"];
  if (v8 !== "G") {
    if (v8 !== "PG") {
      if (v8 !== "PG13") {
        if (v8 !== "R") {
          e[6](v8);
        }
      }
    }
  }
  return { id: v0, title: v2, tags: v3, rating: v8 };
};

Thanks to eval, we can eliminate function calls and inline all type validations using if statements. Also, knowing about the Output type at runtime allows us to perform transformations with zero wasteful object allocations, optimizing the operation for JavaScript engines.

Interestingly, you probably think that calling eval itself is slow, and I thought this myself. However, it was actually not as slow as I expected. For example, creating a simple nested object schema and calling the parser once happened to be 1.8 times faster with ReScript Schema using eval than Zod. I really put a lot of effort into making it as fast as possible, and I have to thank the ReScript language and the people behind it for allowing me to write very performant and safe code.

Talking about ArkType, they use the same approach with eval and have similar potential to ReScript Schema, but their evaluated code is not there yet. Currently, their operations are a little bit slower, and the schema creation is significantly slower. But I can see that it can somewhat catch up in the future.

What other libraries will never be able to catch up on is the ability to reshape schema declaratively. And this is why I say that ReScript Schema is faster than Typia. Also, Typia doesn't always generate the most optimized code, e.g., for optional fields. And it doesn't come with many built-in operations specifically optimized for the desired use case. Still, this is an excellent library with Fast JSON Serialization and Protocol Buffer Encoding features, which I'm still yet to implement.

Ecosystem

When choosing a schema library for your project, where performance is not a concern, the ecosystem is the most important factor to consider. With a schema, you can do millions of things by knowing the type of representation in runtime. Such as JSON Schema generation, describing database schemas, optimized comparison and hashing, encoding to proto buff, building forms, mocking data, communicating with AI, and much more.

Zod is definitely a winner here. I counted 78 libraries integrating with Zod at the moment of writing the article. There are even some where you provide a Zod schema, and it renders a Vue page with a form prompting for the data. This is just too convenient for not using it for prototyping.

But if you don't need something super specific, ReScript Schema has a decent ecosystem itself, which is comparable to Valibot and ArkType. Actually, it has an even higher potential thanks to the ability to adjust Shape and automatically Reverse the schema. A good example of this is ReScript Rest, which combines the DX of tRPC while staying unopinionated like ts-rest. I also built many powerful tools around ReScript Schema, but I have to admit that I haven't added TS support yet. Let me know if you find something interesting to use, and I'll do this asap 😁

Also, ReScript Schema supports Standard Schema, a common interface for TypeScript validation libraries. It was recently designed by the creators of Zod, Valibot, and ArkType and has already been integrated into many popular libraries. This means that you can use ReScript Schema with tRPC, TanStack Form, TanStack Router, Hono, and 19+ more at the time of writing the article.

Conclusion

As the title says, I wholeheartedly believe that ReScript Schema is the future of schema libraries. It offers both DX, performance, bundle size, and many innovative features. I tried to cover all of them at a high level, and I hope I managed to make you at least a little bit interested 👌

I don't persuade you to choose ReScript Schema for your next project, and I actually still recommend Zod when somebody asks me. But I'll definitely appreciate a star and X follow 🙏

Let's see how the future of schema libraries will turn out. Maybe I'll rename ReScript Schema to something dope and become more popular than Zod? Cheers 😁

Top comments (1)

Ravindra Kumar • Feb 22

Awesome !

DEV Community