Aarav Joshi

Posted on Nov 15

Build Your Own JavaScript-Compatible Language: Mastering Compiler Design

#programming #devto #javascript #softwareengineering

Creating your own programming language that compiles to JavaScript is a fascinating journey. It's a project that'll push your skills to the limit and give you a deeper understanding of how languages work under the hood.

Let's start with the basics. A compiler for a custom language to JavaScript typically involves three main stages: lexical analysis, parsing, and code generation.

Lexical analysis is the first step. Here, we break down our source code into tokens. These are the smallest units of meaning in our language. For example, in the statement "let x = 5;", we'd have tokens for "let", "x", "=", "5", and ";".

Here's a simple lexer in JavaScript:

function lexer(input) {
    let tokens = [];
    let current = 0;

    while (current < input.length) {
        let char = input[current];

        if (char === '=' || char === ';') {
            tokens.push({ type: 'operator', value: char });
            current++;
            continue;
        }

        if (/\s/.test(char)) {
            current++;
            continue;
        }

        if (/[a-z]/i.test(char)) {
            let value = '';
            while (/[a-z]/i.test(char)) {
                value += char;
                char = input[++current];
            }
            tokens.push({ type: 'identifier', value });
            continue;
        }

        if (/\d/.test(char)) {
            let value = '';
            while (/\d/.test(char)) {
                value += char;
                char = input[++current];
            }
            tokens.push({ type: 'number', value });
            continue;
        }

        throw new Error('Unknown character: ' + char);
    }

    return tokens;
}

This lexer can handle simple assignments like "let x = 5;". It's basic, but it gives you an idea of how lexical analysis works.

Next comes parsing. This is where we take our stream of tokens and build an Abstract Syntax Tree (AST). The AST represents the structure of our program.

Here's a simple parser for our language:

function parser(tokens) {
    let current = 0;

    function walk() {
        let token = tokens[current];

        if (token.type === 'identifier' && token.value === 'let') {
            let node = {
                type: 'VariableDeclaration',
                name: tokens[++current].value,
                value: null
            };

            current += 2; // Skip the '='
            node.value = walk();

            return node;
        }

        if (token.type === 'number') {
            current++;
            return { type: 'NumberLiteral', value: token.value };
        }

        throw new TypeError(token.type);
    }

    let ast = {
        type: 'Program',
        body: []
    };

    while (current < tokens.length) {
        ast.body.push(walk());
    }

    return ast;
}

This parser can handle simple variable declarations. It's not very robust, but it illustrates the concept.

The final step is code generation. This is where we take our AST and turn it into JavaScript code. Here's a simple code generator:

function codeGenerator(node) {
    switch (node.type) {
        case 'Program':
            return node.body.map(codeGenerator).join('\n');

        case 'VariableDeclaration':
            return 'let ' + node.name + ' = ' + codeGenerator(node.value) + ';';

        case 'NumberLiteral':
            return node.value;

        default:
            throw new TypeError(node.type);
    }
}

Now we can put it all together:

function compile(input) {
    let tokens = lexer(input);
    let ast = parser(tokens);
    let output = codeGenerator(ast);
    return output;
}

console.log(compile('let x = 5;'));
// Outputs: let x = 5;

This is just scratching the surface. A real language compiler would need to handle much more: functions, control structures, operators, and so on. But this gives you a taste of what's involved.

As we expand our language, we'll need to add more token types to our lexer, more node types to our parser, and more cases to our code generator. We might also want to add an intermediate representation (IR) stage between parsing and code generation, which can make it easier to perform optimizations.

Let's add support for simple arithmetic expressions:

// Add to lexer
if (char === '+' || char === '-' || char === '*' || char === '/') {
    tokens.push({ type: 'operator', value: char });
    current++;
    continue;
}

// Add to parser
if (token.type === 'number' || token.type === 'identifier') {
    let node = { type: token.type, value: token.value };
    current++;

    if (tokens[current] && tokens[current].type === 'operator') {
        node = {
            type: 'BinaryExpression',
            operator: tokens[current].value,
            left: node,
            right: walk()
        };
        current++;
    }

    return node;
}

// Add to code generator
case 'BinaryExpression':
    return codeGenerator(node.left) + ' ' + node.operator + ' ' + codeGenerator(node.right);

case 'identifier':
    return node.value;

Now our compiler can handle expressions like "let x = 5 + 3;".

As we continue to build out our language, we'll face interesting challenges. How do we handle operator precedence? How do we implement control structures like if statements and loops? How do we deal with functions and variable scope?

These questions lead us into more advanced topics. We might implement a symbol table to keep track of variables and their scopes. We could add type checking to catch errors before runtime. We might even implement our own runtime environment.

One particularly interesting area is optimization. Once we have our AST, we can analyze and transform it to make the resulting code more efficient. For example, we could implement constant folding, where we evaluate constant expressions at compile time:

function optimize(node) {
    if (node.type === 'BinaryExpression' &&
        node.left.type === 'NumberLiteral' &&
        node.right.type === 'NumberLiteral') {
        let result;
        switch (node.operator) {
            case '+': result = Number(node.left.value) + Number(node.right.value); break;
            case '-': result = Number(node.left.value) - Number(node.right.value); break;
            case '*': result = Number(node.left.value) * Number(node.right.value); break;
            case '/': result = Number(node.left.value) / Number(node.right.value); break;
        }
        return { type: 'NumberLiteral', value: result.toString() };
    }
    return node;
}

We could call this function on each node during the code generation phase.

Another advanced topic is source map generation. Source maps allow debuggers to map between the generated JavaScript and our original source code, making debugging much easier.

As we delve deeper into language design, we start to appreciate the nuances and trade-offs involved. Should our language be strongly typed or dynamically typed? How do we balance expressiveness with safety? What syntax will make our language intuitive and easy to use?

Building a language that compiles to JavaScript also gives us a unique perspective on JavaScript itself. We start to see why certain design decisions were made, and we gain a deeper appreciation for the language's quirks and features.

Moreover, this project can significantly enhance our understanding of other languages and tools. Many of the concepts we encounter - lexical scoping, type systems, garbage collection - are fundamental to programming language design and implementation.

It's worth noting that while we're compiling to JavaScript, many of these principles apply to other target languages as well. Once you understand the basics, you could adapt your compiler to output Python, Java, or even machine code.

As we wrap up, it's clear that building a language transpiler is no small task. It's a project that can grow with you, always offering new challenges and learning opportunities. Whether you're looking to create a domain-specific language for a particular problem, or you're just curious about how languages work, this project is an excellent way to deepen your programming knowledge.

Remember, the goal isn't necessarily to create the next big programming language. The real value is in the journey - the understanding you gain, the problems you solve, and the new ways of thinking you develop. So don't be afraid to experiment, to make mistakes, and to push the boundaries of what you think is possible. Happy coding!

Our Creations

Be sure to check out our creations:

We are on Medium

DEV Community

Build Your Own JavaScript-Compatible Language: Mastering Compiler Design

Our Creations

We are on Medium

Top comments (0)

Read next

DATATYPES IN C (double, long double, void, bool)

Creating a Multilingual Expense Tracker with Tolgee

Strapi — Why

Discover JSREPL.io – A JavaScript REPL & Playground