DEV Community

Cover image for Building a Compiler & Interpreter in Rust. part i
Pluri45
Pluri45

Posted on

Building a Compiler & Interpreter in Rust. part i

Computers do not understand human languages directly, as they operate in binary machine code. To bridge this gap, intermediaries like compilers and interpreters convert human-readable code into machine-readable instructions. High-level and low-level programming languages were invented to simplify the process of giving instructions to computers in a way that humans can more easily understand and write.

Difference between compilers and interpreters:

A compiler translates the entire source code of a high-level programming language into machine code in one go. Thereafter, the system stores and executes the machine code. This approach is efficient for execution, as the translation happens before the program runs. Examples of compiled languages include C, C++, and Rust.

Meanwhile, interpreters translate high-level code into machine code line by line, executing each instruction immediately after translation. This allows for dynamic execution but can be slower compared to compiled code. Examples of interpreted languages include: Python, JavaScript, PHP, etc.

Compiler & Interpreter

Building a compiler for an MCL language in Rust-general file structure and function.

MCL is an invented language, you can decide to build yours if you feel excited. The objective would be to take the high level code from the .mcl file, using the code in the compiler file. The compiler reads the keywords that match the already started instructions, then converts the file to bytecode(an intermediary language ). The interpreter has a Virtual Machine that is called to execute the bytecode into machine code, and the output back into human readable language is outputted in the terminal.

Compiler workflow

Main file &General overview.

Compiling and executing:

The compiler processes the .mcl file, converting it into an intermediate representation ( the bytecode). You need to take instructions from the terminal. First, you need to import modules. This program provides a framework for compiling, executing, and handling bytecode. By breaking down the code, you will see how each function contributes to processing the .mcl file.

Imports and Modules

The first part of the code imports the important libraries and modules:

// clap::Parser: Used for command-line argument parsing.
use clap::Parser;

// anyhow::Result: Simplifies error handling.
use anyhow::Result;

// log library: Provides logging functionality for different levels (e.g., info, debug, error).
use log::{debug, info, error};

// std utilities for file manipulation, path management, exiting the program, and writing to files.
use std::fs;
use std::path;
use std::process::exit;
use std::io::Write;

// Modules (compilers, interpreter, op, instr) define core functionality 
// for the virtual machine, compilation, and bytecode handling.
// These are in separate files and are imported as modules into main.rs.
mod compilers;
mod interpreter;
mod op;
mod instr;

// Specific imports from the modules.
use crate::instr::Instr;
use crate::interpreter::{VM, decode_instructions, encode_instructions};


Enter fullscreen mode Exit fullscreen mode

Argument Parsing with clap

The Args structure defines command-line arguments:

#[derive(Parser)]
struct Args {
#[arg(long, default_value_t = false)]
//--compile: Compiles source files.
compile: bool,
#[arg(long, default_value_t = false)]
//--exec: Executes bytecode.
exec: bool,
#[arg(long, default_value_t = false)]
//--optimize: Enables optimization (placeholder, unused here).
optimize: bool,
#[arg(long, default_value_t = false)]
// --decompile: For decompiling bytecode (placeholder, unused here).
decompile: bool,
#[arg(long)]
// --file: Specifies the file to process.
file: Option<String>,

}

Enter fullscreen mode Exit fullscreen mode

The Main Function

The main function initializes the program and handles errors:

fn main() {
    if let Err(err) = run() {
        error!("Error: {}", err);
        exit(1);
    }
}


Enter fullscreen mode Exit fullscreen mode

Logging Setup

You write the logger to format errors properly in the terminal. This ensures proper spacing and colours are added to the terminal when it’s logged.


env_logger::builder()
    .filter_level(log::LevelFilter::max())
    .format(|buf, record| {
        let level = match record.level() {
            log::Level::Error => "\x1b[31mERROR\x1b[0m", // Red
            log::Level::Warn => "\x1b[33mWARN\x1b[0m",   // Yellow
            log::Level::Info => "\x1b[32mINFO\x1b[0m",   // Green
            log::Level::Debug => "\x1b[34mDEBUG\x1b[0m", // Blue
            log::Level::Trace => "\x1b[35mTRACE\x1b[0m", // Magenta
        };

        writeln!(buf, "{:14} | {}", level, record.args())
    })
    .init();


Enter fullscreen mode Exit fullscreen mode

Core Logic

a. Compile

This section compiles the input file into bytecode if the --compile flag is set:

let args = Args::parse();

let mut bytecode = vec![];

// Compile
if args.compile {
    if let Some(filename) = args.file.as_deref() {
        if path::Path::new(&filename).extension() == Some(std::ffi::OsStr::new("mcl")) {
            bytecode = compilers::compile(&fs::read_to_string(filename)?)?
                .into_iter()
                .map(|instr| instr.to_u64())
                .collect();
        } else {
            eprintln!("Error: Unsupported file extension for compilation.");
            exit(1);
        }
    } else {
        error!("Error: --file must be specified when using --compile");
        exit(1);
    }
}

Enter fullscreen mode Exit fullscreen mode

This is where you take the inputs from the compilers file which you will come about in the next section. It takes raw inputs from the file and converts them into tokens that can be processed by the interpreter. It does this by checking if the input file is an .mcl file. It reads and compiles the file into bytecode using the compilers::compile function.

Execute

If --exec is specified, the program executes the bytecode:


// Execute
if args.exec {
    if !args.compile {
        if let Some(file) = args.file.as_deref() {
            if path::Path::new(file).extension() == Some(std::ffi::OsStr::new("mcl")) {
                error!("Error: Cannot execute an uncompiled file.");
                exit(1);
            }

            bytecode = fs::read(file)?
                .into_iter()
                .map(|instr| instr.to_u64())
                .collect();
        } else {
            eprintln!("Error: --file must be specified.");
            exit(1);
        }
    }

    let mut vm = VM::new();
    vm.execute(bytecode)?;
}



Enter fullscreen mode Exit fullscreen mode

It reads the compiled bytecode and executes it in a virtual machine (VM) using the vm.execute method.

Save Bytecode

If execution is not required, the bytecode is saved to a file:

else {
    // Save bytecode to a file
    if let Some(file) = args.file.as_deref() {
        let instrs: Vec<Instr> = bytecode.iter().map(|&b| Instr::from_u64(b)).collect();
        let mut new_file = fs::File::create(path::Path::new(file).with_extension("mclb"))?;
        new_file.write_all(encode_instructions(&instrs)?.as_slice())?;
    } else {
        error!("Error: --file must be specified for output.");
        exit(1);
    }
}

Ok(())


Enter fullscreen mode Exit fullscreen mode

The interpreter converts the bytecode back into instructions and saves it to a .mclb file, which is its output.

Conclusion.

In this article, you learned about the components of building a compiler and interpreter in Rust. You now know the difference between compiled and interpreted languages. Lastly, the general architecture of the compiler project which primarily consists of modules such as : Compilers, Interpreter, op, and instr. You will learn about the modules in future releases of the series.

If you are interested in the video verison of this article, check out Lion

Top comments (0)