This article is a repost of an ADR from Matanuska BASIC, my attempt to write a BASIC interpreter in TypeScript.
Context
In the language of Writing Interactive Compilers and Interpreters
(WIC&I), the encoding language is the language in which the interpreter is written. This is in contrast to the source language, which is the language being implemented by the interpreter.
Practically speaking, there are two categories of choices when it comes to choosing the encoding language:
- A high level interpreted language - for example, Python, Java, or Typescript
- A lower level compiled language - for example, C, C++, or Rust
Trade-Offs
The advantages of a compiled language largely come down to performance. It can be difficult to get the same speed out of an interpreted language as you can from C. On the other hand, the higher level languages may be easier to work with.
As to my own skillsets: my strongest language is Typescript, followed by Python. I'm very weak in C, have little experience with C++, and am still learning Rust.
Writing an interpreter is a challenging problem - or collection of problems. Therefore, it pays to work in a language the author is comfortable with, so they may focus on the core problems.
On Rust
In fact, the first attempts at implementing a BASIC were in Rust. Rust is a modern, high level language as compared to C. But I found myself having a number of struggles:
- Polymorphism in Rust can be challenging. While it supports generic traits, there are many limitations. For example, until recently Rust had poor support for async methods in traits. It also has some idiosyncratic requirements around known sizes of properties, and requires many hoops to have multiple dynamic types within a single collection.
- The parser library I was using,
nom
, had some challenges, particularly around writing parsers over types other thanstr
and&[u8]
. They aren't intractible to a Rust expert, but definitely require some skill. - Rust is particularly pedantic around text encodings. This even shows up in how it treats paths.
- Errors are non-generic, and must be converted or wrapped into reified types. The
thiserror
andanyhow
crates can help get started, but I found it difficult to prototype errors when I didn't know what I was looking for. - Many of the techniques used by a bytecode compiler require "unsafe" features of Rust, an intermediate/advanced topic.
While Rust is a compelling target for an interpreter in general, I found it challenging to both learn how to implement an interpreter and level up in Rust.
Extension & Plugins
Another consideration is extensions. A scripting language is likely the most straightforward mechanism for extension. The alternative is generating dynlibs. This strategy is possible, but not as seamless. Moreover, it would require expertise in lower level mechanics on the part of extension developers.
Decision
The first version of Matanuska BASIC will be implemented in TypeScript and Node.JS. This will allow me to focus on learning how an interpreter works. It will also ensure I have the expertise in the encoding language to effectively prototype concepts.
Future
In the future, as Matanuska BASIC matures, it is likely that I will consider rewriting it, in part or in whole, in a compiled language. The decision is out of scope of this ADR. However, I would still like to outline some of the considerations.
Chances are high that I will want to do this incrementally by leveraging Node's native addons. The standard addon language is C++. However, there are toolkits for Rust as well.
Between the two, Rust has some advantages:
- Many baked-in high level abstractions and data structures, including automatically resizing
Vec
s andMap
s - Great build and test tooling
- A fantastic library ecosystem
- Memory safety, unsafe features notwithstanding
- Unsafe features where necessary
On the other hand, C++ has its pros as well:
- It's the official extension mechanism, and therefore has the best support
- Supports builds with cmake.js, which I have some experience with and feel positively about
- Supports unsafe operations more easily or naturally than Rust
- Due to its similarity to C, it would be easier to reimplement abstractions from
Crafting Interpreters
Testing is worth elaborating on. Rust has a built-in test framework. C++, does not have a de facto standard test framework, though it does have some options.
The most common option for Node.js addon tests is exporting the addons to Node, and writing the tests in JavaScript. My Node.js test framework is nice to work with, and using it would offer some consistency. However, this would limit me to writing tests for only the API exposed to Node.
A final note: it's possible to mix and match. I could, for example, build a rust library and then link it in a C++ library. The build would be more complex, but it may allow me to leverage the respective advantages of the languages accordingly.
Top comments (1)
An alternative to nom I learned about recently is chumsky:
docs.rs/chumsky/latest/chumsky/#
It seems to have better support for tokens and ships with nice looking errors.