DEV Community

Cover image for Advanced Compilation: A Formal Analysis of a JavaScript-Based Compiler
Bravim Purohit
Bravim Purohit

Posted on

Advanced Compilation: A Formal Analysis of a JavaScript-Based Compiler

Introduction

Compilers serve as fundamental components in programming, transforming high-level source code into executable instructions. This article presents an in-depth analysis of a JavaScript-based compiler that translates a custom scripting language into JavaScript. We systematically deconstruct each phase, from lexical analysis to code generation, and examine key constructs, including variable declarations, conditional branching (if-else), iterative structures (while loops), mathematical operations, and boolean expressions. Additionally, we explore advanced aspects such as error handling, function invocation, and the broader theoretical framework underpinning compiler design.

Theoretical Foundations of the Compiler Pipeline

A compiler traditionally follows a structured multi-phase process:

  1. Lexical Analysis (Lexer) – Tokenization of source code.
  2. Parsing (Parser) – Syntactic structuring into an Abstract Syntax Tree (AST).
  3. Semantic Analysis – Contextual verification of meaning and logic.
  4. Intermediate Representation (IR) Generation – Conversion of AST into an optimized intermediary form.
  5. Code Generation – Transformation of AST nodes into executable JavaScript.
  6. Execution – Interpretation or execution of the generated JavaScript code.

Each of these stages is indispensable for the effective conversion of human-readable source code into machine-executable instructions. The combination of these processes ensures correctness, efficiency, and maintainability in compiler design.

1. Lexical Analysis: Tokenization and Symbol Recognition

The lexer (lexical analyzer) processes raw input and decomposes it into meaningful lexical units known as tokens. These tokens represent syntactic elements such as keywords, identifiers, operators, and literals.

Implementation of the Lexer

function lexer(input) {
  const tokens = [];
  let cursor = 0;

  while (cursor < input.length) {
    let char = input[cursor];

    // Whitespace handling
    if (/\s/.test(char)) {
      cursor++;
      continue;
    }

    // String literals
    if (char === '"') {
      let str = "";
      cursor++;
      while (cursor < input.length && input[cursor] !== '"') {
        str += input[cursor++];
      }
      if (cursor >= input.length) {
        throw new Error("Unterminated string literal");
      }
      cursor++;
      tokens.push({ type: "string", value: str });
      continue;
    }

    // Identifiers and Keywords
    if (/[a-zA-Z]/.test(char)) {
      let word = "";
      while (cursor < input.length && /[a-zA-Z]/.test(char)) {
        word += char;
        char = input[++cursor];
      }
      tokens.push({ type: ["sun", "bata", "agar", "naito", "jabtak"].includes(word) ? "keyword" : "identifier", value: word });
      continue;
    }

    // Numeric constants
    if (/[0-9]/.test(char)) {
      let number = "";
      while (cursor < input.length && /[0-9]/.test(char)) {
        number += char;
        char = input[++cursor];
      }
      tokens.push({ type: "number", value: parseInt(number) });
      continue;
    }

    // Operators and Symbols
    if (/[+-*/=<>(){}]/.test(char)) {
      tokens.push({ type: "operator", value: char });
      cursor++;
      continue;
    }

    throw new Error(`Unrecognized character: ${char}`);
  }
  return tokens;
}
Enter fullscreen mode Exit fullscreen mode

Example Usage of the Lexer

Input:

sun x = 10
bata "Hello, world!"
Enter fullscreen mode Exit fullscreen mode

Lexical Output:

[
  { "type": "keyword", "value": "sun" },
  { "type": "identifier", "value": "x" },
  { "type": "operator", "value": "=" },
  { "type": "number", "value": 10 },
  { "type": "keyword", "value": "bata" },
  { "type": "string", "value": "Hello, world!" }
]
Enter fullscreen mode Exit fullscreen mode

2. Parsing: Construction of the Abstract Syntax Tree (AST)

Parsing involves the hierarchical organization of tokens into an Abstract Syntax Tree (AST), which structurally represents the program.

function parser(tokens) {
  const ast = { type: "program", body: [] };

  function parseExpression() {
    let expression = "";
    while (tokens.length > 0 && tokens[0].type !== "keyword" && tokens[0].type !== "symbol") {
      expression += tokens.shift().value;
    }
    return expression.trim();
  }

  while (tokens.length > 0) {
    let token = tokens.shift();

    if (token.type === "keyword") {
      if (token.value === "sun") {
        let declaration = { type: "Declaration", name: tokens.shift().value, value: null };
        if (tokens[0] && tokens[0].type === "operator" && tokens[0].value === "=") {
          tokens.shift();
          declaration.value = parseExpression();
        }
        ast.body.push(declaration);
      }

      if (token.value === "bata") {
        ast.body.push({ type: "Print", expression: parseExpression() });
      }
    }
  }
  return ast;
}
Enter fullscreen mode Exit fullscreen mode

3. Code Generation: Translation of AST into JavaScript

function codeGenerator(node) {
  switch (node.type) {
    case "program":
      return node.body.map(codeGenerator).join("\n");
    case "Declaration":
      return `let ${node.name} = ${node.value};`;
    case "Print":
      return `console.log(${node.expression});`;
    default:
      throw new Error(`Unknown AST node type: ${node.type}`);
  }
}
Enter fullscreen mode Exit fullscreen mode

4. Execution: Running the Compiled JavaScript Code

function compiler(input) {
  const tokens = lexer(input);
  const ast = parser(tokens);
  const executableCode = codeGenerator(ast);
  return executableCode;
}

function runner(input) {
  eval(input);
}
Enter fullscreen mode Exit fullscreen mode

Conclusion

This study examined a minimal yet effective compiler capable of transforming a custom scripting language into JavaScript. We dissected its key stages—lexical analysis, parsing, code generation, and execution—highlighting core programming constructs. Future work could enhance error diagnostics, introduce optimizations, and extend control flow mechanisms for more complex language processing.

Top comments (0)