Original source:https://medium.com/@huidou/lets-understand-chrome-v8-chapter-6-bytecode-generation-cd189b671a9
Welcome to other chapters of Let’s Understand Chrome V8
Bytecode is the output of the parse, which is an architecture-independent abstract machine code. In this article, we start debugging from the AST, explain bytecode generation, analyze the kernal code and important data structures, as shown in Figure 1.
1. Introduction
V8 has hundreds of bytecodes ranging from simple operations like Add and Sub to complex operations like LdaNamedProperty. Each bytecode can use registers and accumulator as operands. The accumulator is a regular register like any other register, but the difference is that the read/write of the accumulator is implicit.
For example: Add r1 adds the value of register r1 to the accumulator, the accumulator is not explicitly given because it is default.
Bytecodes are defined in v8/src/interpreter/bytecodes.h, here are some examples.
#define BYTECODE_LIST_WITH_UNIQUE_HANDLERS(V) \
/* Extended width operands */ \
V(Wide, ImplicitRegisterUse::kNone) \
V(ExtraWide, ImplicitRegisterUse::kNone) \
\
/* Debug Breakpoints - one for each possible size of unscaled bytecodes */ \
/* and one for each operand widening prefix bytecode */ \
V(DebugBreakWide, ImplicitRegisterUse::kReadWriteAccumulator) \
V(DebugBreakExtraWide, ImplicitRegisterUse::kReadWriteAccumulator) \
V(DebugBreak0, ImplicitRegisterUse::kReadWriteAccumulator) \
V(DebugBreak1, ImplicitRegisterUse::kReadWriteAccumulator, \
OperandType::kReg) \
V(DebugBreak2, ImplicitRegisterUse::kReadWriteAccumulator, \
OperandType::kReg, OperandType::kReg) \
V(DebugBreak3, ImplicitRegisterUse::kReadWriteAccumulator, \
OperandType::kReg, OperandType::kReg, OperandType::kReg) \
V(DebugBreak4, ImplicitRegisterUse::kReadWriteAccumulator, \
OperandType::kReg, OperandType::kReg, OperandType::kReg, \
OperandType::kReg) \
V(DebugBreak5, ImplicitRegisterUse::kReadWriteAccumulator, \
OperandType::kRuntimeId, OperandType::kReg, OperandType::kReg) \
V(DebugBreak6, ImplicitRegisterUse::kReadWriteAccumulator, \
OperandType::kRuntimeId, OperandType::kReg, OperandType::kReg, \
OperandType::kReg) \
\
/* Side-effect-free bytecodes -- carefully ordered for efficient checks */ \
/* - [Loading the accumulator] */ \
V(Ldar, ImplicitRegisterUse::kWriteAccumulator, OperandType::kReg) \
V(LdaZero, ImplicitRegisterUse::kWriteAccumulator) \
V(LdaSmi, ImplicitRegisterUse::kWriteAccumulator, OperandType::kImm) \
V(LdaUndefined, ImplicitRegisterUse::kWriteAccumulator) \
V(LdaNull, ImplicitRegisterUse::kWriteAccumulator) \
V(LdaTheHole, ImplicitRegisterUse::kWriteAccumulator) \
V(LdaTrue, ImplicitRegisterUse::kWriteAccumulator) \
V(LdaFalse, ImplicitRegisterUse::kWriteAccumulator) \
V(LdaConstant, ImplicitRegisterUse::kWriteAccumulator, OperandType::kIdx) \
V(LdaContextSlot, ImplicitRegisterUse::kWriteAccumulator, OperandType::kReg, \
OperandType::kIdx, OperandType::kUImm) \
V(LdaImmutableContextSlot, ImplicitRegisterUse::kWriteAccumulator, \
OperandType::kReg, OperandType::kIdx, OperandType::kUImm) \
V(LdaCurrentContextSlot, ImplicitRegisterUse::kWriteAccumulator, \
OperandType::kIdx) \
V(LdaImmutableCurrentContextSlot, ImplicitRegisterUse::kWriteAccumulator, \
OperandType::kIdx) \
/* - [Register Loads ] */ \
V(Star, ImplicitRegisterUse::kReadAccumulator, OperandType::kRegOut) \
V(Mov, ImplicitRegisterUse::kNone, OperandType::kReg, OperandType::kRegOut) \
V(PushContext, ImplicitRegisterUse::kReadAccumulator, OperandType::kRegOut) \
V(PopContext, ImplicitRegisterUse::kNone, OperandType::kReg) \
/* - [Test Operations ] */ \
V(TestReferenceEqual, ImplicitRegisterUse::kReadWriteAccumulator, \
OperandType::kReg) \
V(TestUndetectable, ImplicitRegisterUse::kReadWriteAccumulator) \
V(TestNull, ImplicitRegisterUse::kReadWriteAccumulator) \
V(TestUndefined, ImplicitRegisterUse::kReadWriteAccumulator) \
V(TestTypeOf, ImplicitRegisterUse::kReadWriteAccumulator, \
OperandType::kFlag8) \
//.........omit.....
The above code is the macro definition of bytecode. Let’s talk about V(Ldar, ImplicitRegisterUse::kWriteAccumulator, OperandType::kReg), Ldar means that load data into the accumulator, ImplicitRegisterUse::kWriteAccumulator and OperandType::kReg are the source operand and destination operand respectively. See the example below:
- LdaSmi [1], [1] is a small int that will be added to the accumulator, as shown in Figure 2.
- Star r1, the r1 is a general register in which the value of the accumulator will be wrote, as shown in Figure 3.
For other bytecode instructions, please refer to the instruction definition file of V8.
In order to improve performance, V8 marks the bytecode with high execution frequency as hot code and uses TurboFan to compile the code into local machine code, as shown in Figure 4.
Turbofan compilation (from bytecode to local machine code) requires more time and resources, so it is only suitable for hot code. Interpreter compilation (from JS to bytecode) requires very little time and resources and is suitable for general cases.
Commonly, the high number of execution will upgrade the bytecodes to hot codes.
But what is the reason for hot code downgrading to bytecode? There are many reasons, the common reason is debugging — — you open F12 to debug JS.
2. Bytecode generation
Before driving into bytecode, we need to know the AST tree, because bytecode generation is the process of walking the AST tree. In V8, walking the AST is a finite state automaton, which together with some predefined macro templates to generate bytecode. Figure 5 shows the data structure of AST.
All nodes of a AST tree inherit from the parent class AstNode, and AstNode has many member methods. Among the many methods, the NodeType method is the most important undoubtedly, because when translating an AstNode into bytecode, the NodeType will convert the parent class AstNode into a specific subclass, such as an ExPRESSION or a STATEMENT. Then, read the corresponding data and generate bytecode. The following code converts AstNode to Assignment.
void BytecodeGenerator::VisitAssignment(Assignment* expr) {
AssignmentLhsData lhs_data = PrepareAssignmentLhs(expr->target());
VisitForAccumulatorValue(expr->value());
builder()->SetExpressionPosition(expr);
BuildAssignment(lhs_data, expr->op(), expr->lookup_hoisting_mode());
}
In the above code, expr->target(), expr->value() and expr->op() may be called recursively because expressions can contain multiple subexpressions.
void BytecodeGenerator::GenerateBytecodeBody() {
// Build the arguments object if it is used.
VisitArgumentsObject(closure_scope()->arguments());
// Build rest arguments array if it is used.
Variable* rest_parameter = closure_scope()->rest_parameter();
VisitRestArgumentsArray(rest_parameter);
// Build assignment to the function name or {.this_function}
// variables if used.
VisitThisFunctionVariable(closure_scope()->function_var());
VisitThisFunctionVariable(closure_scope()->this_function_var());
// Build assignment to {new.target} variable if it is used.
VisitNewTargetVariable(closure_scope()->new_target_var());
// Create a generator object if necessary and initialize the
// {.generator_object} variable.
FunctionLiteral* literal = info()->literal();
if (IsResumableFunction(literal->kind())) {
BuildGeneratorObjectVariableInitialization();
}
// Emit tracing call if requested to do so.
if (FLAG_trace) builder()->CallRuntime(Runtime::kTraceEnter);
// Emit type profile call.
if (info()->flags().collect_type_profile()) {
feedback_spec()->AddTypeProfileSlot();
int num_parameters = closure_scope()->num_parameters();
for (int i = 0; i < num_parameters; i++) {
Register parameter(builder()->Parameter(i));
builder()->LoadAccumulatorWithRegister(parameter).CollectTypeProfile(
closure_scope()->parameter(i)->initializer_position());
}
}
// Increment the function-scope block coverage counter.
BuildIncrementBlockCoverageCounterIfEnabled(literal, SourceRangeKind::kBody);
// Visit declarations within the function scope.
if (closure_scope()->is_script_scope()) {
VisitGlobalDeclarations(closure_scope()->declarations());
} else if (closure_scope()->is_module_scope()) {
VisitModuleDeclarations(closure_scope()->declarations());
} else {
VisitDeclarations(closure_scope()->declarations());
}
// Emit initializing assignments for module namespace imports (if any).
VisitModuleNamespaceImports();
// The derived constructor case is handled in VisitCallSuper.
if (IsBaseConstructor(function_kind())) {
if (literal->class_scope_has_private_brand()) {
BuildPrivateBrandInitialization(builder()->Receiver());
}
if (literal->requires_instance_members_initializer()) {
BuildInstanceMemberInitialization(Register::function_closure(),
builder()->Receiver());
}
}
// Visit statements in the function body.
VisitStatements(literal->body());
// Emit an implicit return instruction in case control flow can fall off the
// end of the function without an explicit return being present on all paths.
if (!builder()->RemainderOfBlockIsDead()) {
builder()->LoadUndefined();
BuildReturn(literal->return_position());
}
}
The above code is the entry for generating bytecode and finally enters into VisitStatements(literal->body()) that is responsible for bytecode generating.
Before generating bytecode, need to take out the type of the subclass, below is the AstNode->XXXtype() that is responsible for taking out the type.
#define DECLARATION_NODE_LIST(V) \
V(VariableDeclaration) \
V(FunctionDeclaration)
#define ITERATION_NODE_LIST(V) \
V(DoWhileStatement) \
V(WhileStatement) \
V(ForStatement) \
V(ForInStatement) \
V(ForOfStatement)
#define BREAKABLE_NODE_LIST(V) \
V(Block) \
V(SwitchStatement)
#define STATEMENT_NODE_LIST(V) \
ITERATION_NODE_LIST(V) \
BREAKABLE_NODE_LIST(V) \
V(ExpressionStatement) \
V(EmptyStatement) \
V(SloppyBlockFunctionStatement) \
V(IfStatement) \
V(ContinueStatement) \
V(BreakStatement) \
V(ReturnStatement) \
V(WithStatement) \
V(TryCatchStatement) \
V(TryFinallyStatement) \
V(DebuggerStatement) \
V(InitializeClassMembersStatement) \
V(InitializeClassStaticElementsStatement)
#define LITERAL_NODE_LIST(V) \
V(RegExpLiteral) \
V(ObjectLiteral) \
V(ArrayLiteral)
//=========separation===============================
#define GENERATE_VISIT_CASE(NodeType) \
case AstNode::k##NodeType: \
return this->impl()->Visit##NodeType(static_cast<NodeType*>(node));
#define GENERATE_FAILURE_CASE(NodeType) \
case AstNode::k##NodeType: \
UNREACHABLE();
//=========separation===============================
#define GENERATE_AST_VISITOR_SWITCH() \
switch (node->node_type()) { \
AST_NODE_LIST(GENERATE_VISIT_CASE) \
FAILURE_NODE_LIST(GENERATE_FAILURE_CASE) \
}
#define DEFINE_AST_VISITOR_SUBCLASS_MEMBERS() \
public: \
void VisitNoStackOverflowCheck(AstNode* node) { \
GENERATE_AST_VISITOR_SWITCH() \
} \
\
void Visit(AstNode* node) { \
if (CheckStackOverflow()) return; \
VisitNoStackOverflowCheck(node); \
} \
The ASTNode is composed of the above three parts of the code. The first part of the code corresponds to Figure 5.
void BytecodeGenerator::VisitStatements(
const ZonePtrList<Statement>* statements) {
for (int i = 0; i < statements->length(); i++) {
// Allocate an outer register allocations scope for the statement.
RegisterAllocationScope allocation_scope(this);
Statement* stmt = statements->at(i);
Visit(stmt);
if (builder()->RemainderOfBlockIsDead()) break;
}
}
The above code is the entry function of bytecode generation. Figure 6 is VisitStatements’s call stack.
Okay, that wraps it up for this share. I’ll see you guys next time, take care!
My blog is cncyclops.com. Please reach out to me if you have any issues.
WeChat: qq9123013 Email: v8blink@outlook.com
Top comments (0)