Building a Compiler & Interpreter in Rust! Part 3
Welcome to Part 3 of our series on Building a Compiler & Interpreter in Rust! In this installment, you'd dive into the heart of the interpreter, focusing on how it processes bytecode instructions and manages data during execution.
You’ll explore the core components of the interpreter, including the Value
and Stack
structures for data storage, the InterpreterError
enum for error handling, and the Virtual Machine (VM)
that ties everything together. Additionally, you’ll look at how instructions are encoded and decoded for efficient execution. If you’re new to the series, be sure to catch up with Part 1 and Part 2 to get up to speed. Let’s get started!
Interpreters
The interpreter processes bytecode instructions and executes them sequentially. Remember, the bytecode is the low-level representation of program instructions gotten from the compiler.
Value and Stack Management
Construct the Value enum and Stack struct to handle data storage and manipulation. A Stack is a data structure used to store intermediate values during execution.
// Import necessary modules and crates
use crate::instr::Instr; // Instruction module for decoding/encoding instructions
use crate::op::Op; // Operation module for supported opcodes
use thiserror::Error; // Provides an easy way to define custom error types
use std::io::Write; // For handling I/O operations
use log::{debug, info, error}; // Logging macros for debugging and error reporting
/// Represents a value in the virtual machine (VM).
/// Currently supports only integer values.
#[derive(Debug, Clone)]
pub enum Value {
Integer(i64), // Variant for storing integer values
}
impl Value {
/// Attempts to retrieve the integer value from `Value`.
///
/// # Returns
/// - `Ok(i64)` if the `Value` is an integer.
/// - `Err(InterpreterError::InvalidType)` if the `Value` is not an integer.
pub fn as_integer(&self) -> Result<i64, InterpreterError> {
match self {
Value::Integer(val) => Ok(*val), // Successfully returns the integer
_ => Err(InterpreterError::InvalidType(
Value::Integer(0), // Expected type
self.clone(), // Actual value causing the error
)),
}
}
}
The Value struct represents data as integers (Value::Integer). It includes methods like as_integer() to safely extract integers. The Stack struct serves as the execution stack:
use crate::instr::Instr;
use crate::op::Op;
use thiserror::Error;
use std::io::Write;
use log::{debug, info, error};
#[derive(Debug)]
pub struct Stack {
data: Vec<Value>,
}
impl Stack {
/// Creates a new, empty `Stack`.
pub fn new() -> Self {
Self { data: Vec::new() }
}
/// Pushes a `Value` onto the stack.
pub fn push(&mut self, value: Value) {
self.data.push(value);
}
/// Pops the top `Value` from the stack. Returns `None` if the stack is empty.
pub fn pop(&mut self) -> Option<Value> {
self.data.pop()
}
/// Peeks at the top `Value` without removing it.
pub fn peek(&self) -> Option<&Value> {
self.data.last()
}
/// Provides mutable access to the top `Value`.
pub fn back_mut(&mut self) -> Option<&mut Value> {
self.data.last_mut()
}
/// Provides immutable access to the top `Value` (alias for `peek`).
pub fn back(&self) -> Option<&Value> {
self.peek()
}
/// Pushes a `Value` onto the stack (alias for `push`).
pub fn push_back(&mut self, value: Value) {
self.push(value);
}
}
InterpreterError enum represents different kinds of errors that can occur in a program during program execution, especially those involved with a virtual machine(VM). Each variant of the enum represents a specific kind of error, with associated data or messages to provide more details.
use crate::instr::Instr;
use crate::op::Op;
use thiserror::Error;
use std::io::Write;
use log::{debug, info, error};
#[derive(Debug, Error)]
pub enum InterpreterError {
/// Error when the provided byte length is not divisible by 8.
#[error("Invalid number of bytes: {0} (must be divisible by 8)")]
InvalidNumberOfBytes(usize),
/// Error when attempting to pop from an empty stack.
#[error("Stack empty")]
StackEmpty,
/// Error when a value type does not match the expected type.
#[error("Invalid type: Expected {0:?}, got {1:?}")]
InvalidType(Value, Value),
/// Error when a division by zero is attempted.
#[error("Division by zero")]
DivByZero,
/// Error when a modulo operation is attempted with a zero divisor.
#[error("Modulo by zero")]
ModByZero,
}
Virtual Machine (VM)
The VM struct represents the interpreter:
// Define a virtual machine (VM) struct that contains a stack and bytecode.
pub struct VM {
stack: Stack, // Stack to store values during execution
bytecode: Vec<u64>, // Bytecode instructions to execute
}
impl VM {
// Constructor for the VM, initializes an empty stack and bytecode.
pub fn new() -> Self {
Self {
stack: Stack::new(),
bytecode: Vec::new(),
}
}
// Executes the provided bytecode instructions.
pub fn execute(&mut self, bytecode: Vec<u64>) -> anyhow::Result<()> {
self.bytecode = bytecode; // Load the bytecode into the VM
let mut pc = 0; // Program counter to track the current instruction
// Macro to handle operations, printing the operation name and executing the associated function.
macro_rules! handle_op {
($name:expr, $func:expr) => {
println!("> {}", $name); // Print the operation name
$func()?; // Execute the operation and propagate errors
};
}
// Main execution loop: processes each instruction in the bytecode.
while pc < self.bytecode.len() {
let instr = Instr::from_u64(self.bytecode[pc]); // Decode the current instruction
let mut next_pc = pc + 1; // Default next instruction is the next one
debug!(">> stack: {:?}", self.stack); // Debug print the stack state
// Match the operation and execute the corresponding logic
match instr.op {
Op::Push => handle_op!("Push", || Ok(self.stack.push(Value::Integer(instr.value as i64)))),
Op::Pop => handle_op!("Pop", || self.stack.pop().ok_or(InterpreterError::StackEmpty.into())),
Op::Inc => handle_op!("Inc", || self.modify_top(|v| *v += 1)),
Op::Add => handle_op!("Add", || self.binary_op(|a, b| a + b)),
Op::Mul => handle_op!("Mul", || self.binary_op(|a, b| a * b)),
Op::Div => handle_op!("Div", || self.checked_op(|a, b| b != 0, |a, b| a / b, "DivByZero")),
Op::Mod => handle_op!("Mod", || self.checked_op(|a, b| b != 0, |a, b| a % b, "ModByZero")),
Op::Dup => handle_op!("Dup", || self.stack.back().map(|v| self.stack.push(v.clone())).ok_or(InterpreterError::StackEmpty.into())),
Op::Print => handle_op!("Print", || self.stack.peek().and_then(|v| v.as_integer().ok()).map(|v| println!("{}", v)).ok_or(InterpreterError::StackEmpty.into())),
Op::Jmp => handle_op!("Jmp", || { next_pc = instr.value as usize; Ok(()) }),
Op::Halt => { println!("> Halt"); return Ok(()); }, // Halt execution
_ => todo!("Unhandled Op::{:?}", instr.op), // Handle unimplemented operations
}
pc = next_pc; // Move to the next instruction
}
Ok(()) // Execution completed successfully
}
// Modifies the top value on the stack using the provided function.
fn modify_top(&mut self, f: impl FnOnce(&mut i64)) -> anyhow::Result<()> {
self.stack
.back_mut()
.and_then(|v| v.as_integer_mut())
.map(f)
.ok_or(InterpreterError::StackEmpty.into())
}
// Performs a binary operation on the top two values of the stack.
fn binary_op(&mut self, f: impl FnOnce(i64, i64) -> i64) -> anyhow::Result<()> {
let (b, a) = (self.pop_val()?, self.pop_val()?); // Pop two values
self.stack.push(Value::Integer(f(a, b))); // Push the result
Ok(())
}
// Performs a checked binary operation, ensuring a condition is met before execution.
fn checked_op(
&mut self,
cond: impl Fn(i64, i64) -> bool,
f: impl Fn(i64, i64) -> i64,
err_msg: &str,
) -> anyhow::Result<()> {
let (b, a) = (self.pop_val()?, self.pop_val()?); // Pop two values
if cond(a, b) {
self.stack.push(Value::Integer(f(a, b))); // Push the result if the condition is met
Ok(())
} else {
Err(InterpreterError::Custom(err_msg.into()).into()) // Return an error otherwise
}
}
// Pops a value from the stack and returns it as an integer.
fn pop_val(&mut self) -> anyhow::Result<i64> {
self.stack
.pop()
.and_then(|v| v.as_integer().ok())
.ok_or(InterpreterError::StackEmpty.into())
}
}
// --- Bytecode Encoding/Decoding ---
const INSTR_SIZE: usize = 8; // Size of each instruction in bytes
// Decodes a vector of bytes into a vector of instructions.
pub fn decode_instructions(bytes: Vec<u8>) -> anyhow::Result<Vec<Instr>> {
(bytes.len() % INSTR_SIZE == 0)
.then(|| {
(0..bytes.len() / INSTR_SIZE)
.map(|i| {
Instr::from_u64(u64::from_be_bytes(
bytes[i * INSTR_SIZE..][..8].try_into().unwrap(),
))
})
.collect()
})
.ok_or(InterpreterError::InvalidNumberOfBytes(bytes.len()).into())
}
// Encodes a vector of instructions into a vector of bytes.
pub fn encode_instructions(bytecode: &[Instr]) -> anyhow::Result<Vec<u8>> {
Ok(bytecode
.iter()
.flat_map(|i| i.to_u64().to_be_bytes())
.collect())
}
Instr file
The Instr struct represents a single instruction in the VM. It combines Operation (Op):(e.g., push a value, add two numbers)
and value (u64)
like a number to push onto the stack.
use crate::op::Op;
#[derive(Debug)]
pub struct Instr {
pub op: Op,
pub value: u64,
}
impl Instr {
/// Converts an `Instr` into a `u64` by placing the opcode in the highest 8 bits and the value in the remaining 56 bits.
pub fn to_u64(&self) -> u64 {
((self.op.to_u64() & 0xFF) << 56) | self.value
}
/// Constructs an `Instr` from a `u64` by extracting the opcode and value.
///
/// # Panics
/// Panics if the opcode is invalid.
pub fn from_u64(value: u64) -> Self {
let op = Op::from_u64((value >> 56) & 0xFF).expect("Invalid Op code");
let val = value & 0x00FF_FFFF_FFFF_FFFF;
Instr { op, value: val }
}
}
By packing op and value into a single 64-bit integer, you:
Reduce memory usage.
Simplify bytecode storage and transmission.
Human-Readable Debugging
To add new operations, simply update the Op enum and the VM automatically supports encoding/decoding them.
Conclusion
In this part of the series, you’ve built the foundation of our interpreter by implementing key components like the Stack, Value, and Virtual Machine. You’ve also explored how instructions are encoded and decoded, ensuring efficient execution and memory usage.
By packing operations and values into a single 64-bit integer, you’ve optimized storage and simplified debugging. This sets the stage for adding more advanced features and operations in future installments.
Top comments (0)