Introduction
Compilers serve as fundamental components in programming, transforming high-level source code into executable instructions. This article presents an in-depth analysis of a JavaScript-based compiler that translates a custom scripting language into JavaScript. We systematically deconstruct each phase, from lexical analysis to code generation, and examine key constructs, including variable declarations, conditional branching (if-else
), iterative structures (while
loops), mathematical operations, and boolean expressions. Additionally, we explore advanced aspects such as error handling, function invocation, and the broader theoretical framework underpinning compiler design.
Theoretical Foundations of the Compiler Pipeline
A compiler traditionally follows a structured multi-phase process:
- Lexical Analysis (Lexer) – Tokenization of source code.
- Parsing (Parser) – Syntactic structuring into an Abstract Syntax Tree (AST).
- Semantic Analysis – Contextual verification of meaning and logic.
- Intermediate Representation (IR) Generation – Conversion of AST into an optimized intermediary form.
- Code Generation – Transformation of AST nodes into executable JavaScript.
- Execution – Interpretation or execution of the generated JavaScript code.
Each of these stages is indispensable for the effective conversion of human-readable source code into machine-executable instructions. The combination of these processes ensures correctness, efficiency, and maintainability in compiler design.
1. Lexical Analysis: Tokenization and Symbol Recognition
The lexer (lexical analyzer) processes raw input and decomposes it into meaningful lexical units known as tokens. These tokens represent syntactic elements such as keywords, identifiers, operators, and literals.
Implementation of the Lexer
function lexer(input) {
const tokens = [];
let cursor = 0;
while (cursor < input.length) {
let char = input[cursor];
// Whitespace handling
if (/\s/.test(char)) {
cursor++;
continue;
}
// String literals
if (char === '"') {
let str = "";
cursor++;
while (cursor < input.length && input[cursor] !== '"') {
str += input[cursor++];
}
if (cursor >= input.length) {
throw new Error("Unterminated string literal");
}
cursor++;
tokens.push({ type: "string", value: str });
continue;
}
// Identifiers and Keywords
if (/[a-zA-Z]/.test(char)) {
let word = "";
while (cursor < input.length && /[a-zA-Z]/.test(char)) {
word += char;
char = input[++cursor];
}
tokens.push({ type: ["sun", "bata", "agar", "naito", "jabtak"].includes(word) ? "keyword" : "identifier", value: word });
continue;
}
// Numeric constants
if (/[0-9]/.test(char)) {
let number = "";
while (cursor < input.length && /[0-9]/.test(char)) {
number += char;
char = input[++cursor];
}
tokens.push({ type: "number", value: parseInt(number) });
continue;
}
// Operators and Symbols
if (/[+-*/=<>(){}]/.test(char)) {
tokens.push({ type: "operator", value: char });
cursor++;
continue;
}
throw new Error(`Unrecognized character: ${char}`);
}
return tokens;
}
Example Usage of the Lexer
Input:
sun x = 10
bata "Hello, world!"
Lexical Output:
[
{ "type": "keyword", "value": "sun" },
{ "type": "identifier", "value": "x" },
{ "type": "operator", "value": "=" },
{ "type": "number", "value": 10 },
{ "type": "keyword", "value": "bata" },
{ "type": "string", "value": "Hello, world!" }
]
2. Parsing: Construction of the Abstract Syntax Tree (AST)
Parsing involves the hierarchical organization of tokens into an Abstract Syntax Tree (AST), which structurally represents the program.
function parser(tokens) {
const ast = { type: "program", body: [] };
function parseExpression() {
let expression = "";
while (tokens.length > 0 && tokens[0].type !== "keyword" && tokens[0].type !== "symbol") {
expression += tokens.shift().value;
}
return expression.trim();
}
while (tokens.length > 0) {
let token = tokens.shift();
if (token.type === "keyword") {
if (token.value === "sun") {
let declaration = { type: "Declaration", name: tokens.shift().value, value: null };
if (tokens[0] && tokens[0].type === "operator" && tokens[0].value === "=") {
tokens.shift();
declaration.value = parseExpression();
}
ast.body.push(declaration);
}
if (token.value === "bata") {
ast.body.push({ type: "Print", expression: parseExpression() });
}
}
}
return ast;
}
3. Code Generation: Translation of AST into JavaScript
function codeGenerator(node) {
switch (node.type) {
case "program":
return node.body.map(codeGenerator).join("\n");
case "Declaration":
return `let ${node.name} = ${node.value};`;
case "Print":
return `console.log(${node.expression});`;
default:
throw new Error(`Unknown AST node type: ${node.type}`);
}
}
4. Execution: Running the Compiled JavaScript Code
function compiler(input) {
const tokens = lexer(input);
const ast = parser(tokens);
const executableCode = codeGenerator(ast);
return executableCode;
}
function runner(input) {
eval(input);
}
Conclusion
This study examined a minimal yet effective compiler capable of transforming a custom scripting language into JavaScript. We dissected its key stages—lexical analysis, parsing, code generation, and execution—highlighting core programming constructs. Future work could enhance error diagnostics, introduce optimizations, and extend control flow mechanisms for more complex language processing.
Top comments (0)