DEV Community

Cover image for Compilation Process - Compilation
rndthts.dev
rndthts.dev

Posted on • Originally published at rndthts.dev

Compilation Process - Compilation

Cover Photo by Bernard Hermant on Unsplash

In the last blog post, we covered the Preprocessing stage. Now, it's time to dive into the Compilation stage.

Compilation

What is Compilation?

Compilation is the process of converting preprocessed source code (in our case, C or C++ code) into assembly code. This phase is handled by the compiler, which performs various checks and optimizations before generating low-level code.

Unlike the preprocessing phase, which merely modifies the code textually, compilation involves deeper analysis and transformation.

Note: To stop the compilation process after the compilation phase, use the -S flag:

gcc/clang -S main.c -o main.s
Enter fullscreen mode Exit fullscreen mode

Let's explore this with an example:

foo.h

int foo(void);
Enter fullscreen mode Exit fullscreen mode

main.c

#include "foo.h"

int square(int i) { return i * i; }

int main() {
  int a = 5;
  int f = foo();
  return f + square(a);
}

Enter fullscreen mode Exit fullscreen mode

Key Stages of Compilation

1. Lexical Analysis (Tokenization)

The preprocessed code is broken down into tokens. For example, the line:

int a = 5;
Enter fullscreen mode Exit fullscreen mode

will be tokenized into: int, a, =, 5 and ;

This stage eliminates whitespace and detects unknown symbols. For instance:

int    a    = 5  ;
Enter fullscreen mode Exit fullscreen mode

will still end up with the same tokens int, a, =, 5 and ;

However, this will cause an error:
int a = 5 @ 3;

error: expected ';' at end of declaration
    6 |   int a = 5 @3;
      |            ^
      |
Enter fullscreen mode Exit fullscreen mode

2. Syntax Analysis (Parsing)

  • Tokens are grouped based on grammar rules to form an Abstract Syntax Tree (AST).

  • The AST represents the hierarchical syntactic structure of the code, abstracting unnecessary details like semicolons or parentheses.

  • This phase ensures that the code structure follows language syntax.
    Example:
    int a = 5 // Missing the ;

output:

error: expected ';' at end of declaration
    6 |   int a = 5 // Missing the ;
      |            ^
      |
Enter fullscreen mode Exit fullscreen mode

3. Semantic Analysis

This phase checks for logical correctness:

  • Ensures variables are declared before use.
int main() {
  // int a = 5;
  int f = foo();
  return f + square(a);
}
Enter fullscreen mode Exit fullscreen mode
 error: use of undeclared identifier 'a'
    8 |   return f + square(a);
      |                     ^
Enter fullscreen mode Exit fullscreen mode
  • Ensures type compatibility (e.g., no assigning a string to an integer).
int a = "hello";
Enter fullscreen mode Exit fullscreen mode

Error:

error: incompatible pointer to integer conversion initializing 'int' with an expression of type 'char[6]' [-Wint-conversion]
    6 |   int a = "hello";
      |       ^   ~~~~~~~
Enter fullscreen mode Exit fullscreen mode

4. IR - Intermediate Representation

  • The compiler converts the AST into an Intermediate Representation (IR) that is platform-independent.

  • IR serves as a bridge between high-level source code and low-level machine code, enabling optimizations.

In future posts, we'll cover this topic in depth, along with various optimizations. For now, let's see how to instruct the compiler to emit this form.

To generate LLVM IR using clang:

clang main.c -S -emit-llvm -o main.ll`
Enter fullscreen mode Exit fullscreen mode

Unoptimized Version:

; ModuleID = 'main.c'
source_filename = "main.c"
target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128-Fn32"
target triple = "arm64-apple-macosx14.0.0"

; Function Attrs: noinline nounwind optnone ssp uwtable(sync)
define i32 @square(i32 noundef %0) #0 {
  %2 = alloca i32, align 4
  store i32 %0, ptr %2, align 4
  %3 = load i32, ptr %2, align 4
  %4 = load i32, ptr %2, align 4
  %5 = mul nsw i32 %3, %4
  ret i32 %5
}

; Function Attrs: noinline nounwind optnone ssp uwtable(sync)
define i32 @main() #0 {
  %1 = alloca i32, align 4
  %2 = alloca i32, align 4
  %3 = alloca i32, align 4
  store i32 0, ptr %1, align 4
  store i32 5, ptr %2, align 4
  %4 = call i32 @foo()
  store i32 %4, ptr %3, align 4
  %5 = load i32, ptr %3, align 4
  %6 = load i32, ptr %2, align 4
  %7 = call i32 @square(i32 noundef %6)
  %8 = add nsw i32 %5, %7
  ret i32 %8
}

declare i32 @foo() #1

attributes #0 = {...}
attributes #1 = {...}
...
...
Enter fullscreen mode Exit fullscreen mode

Compiler Optimizations

The compiler optimizes the IR before producing assembly code. Common optimizations include:

  • Function Inlining
  • Dead Code Elimination
  • Loop Unrolling
  • Loop Invariant Code Motion
  • Merge Functions
  • etc ...

We'll explore these in future posts.

Final result, arm64 assembly

linux

    .text
    .file   "main.c"
    .globl  square                          // -- Begin function square
    .p2align    2
    .type   square,@function
square:                                 // @square
    .cfi_startproc
// %bb.0:
    sub sp, sp, #16
    .cfi_def_cfa_offset 16
    str w0, [sp, #12]
    ldr w8, [sp, #12]
    ldr w9, [sp, #12]
    mul w0, w8, w9
    add sp, sp, #16
    .cfi_def_cfa_offset 0
    ret
.Lfunc_end0:
    .size   square, .Lfunc_end0-square
    .cfi_endproc
                                        // -- End function
    .globl  main                            // -- Begin function main
    .p2align    2
    .type   main,@function
main:                                   // @main
    .cfi_startproc
// %bb.0:
    sub sp, sp, #32
    .cfi_def_cfa_offset 32
    stp x29, x30, [sp, #16]             // 16-byte Folded Spill
    add x29, sp, #16
    .cfi_def_cfa w29, 16
    .cfi_offset w30, -8
    .cfi_offset w29, -16
    stur    wzr, [x29, #-4]
    mov w8, #5                          // =0x5
    str w8, [sp, #8]
    bl  foo
    str w0, [sp, #4]
    ldr w8, [sp, #4]
    str w8, [sp]                        // 4-byte Folded Spill
    ldr w0, [sp, #8]
    bl  square
    ldr w8, [sp]                        // 4-byte Folded Reload
    add w0, w8, w0
    .cfi_def_cfa wsp, 32
    ldp x29, x30, [sp, #16]             // 16-byte Folded Reload
    add sp, sp, #32
    .cfi_def_cfa_offset 0
    .cfi_restore w30
    .cfi_restore w29
    ret
.Lfunc_end1:
    .size   main, .Lfunc_end1-main
    .cfi_endproc
                                        // -- End function
    .ident  "Ubuntu clang version 18.1.3 (1ubuntu1)"
    .section    ".note.GNU-stack","",@progbits
    .addrsig
    .addrsig_sym square
    .addrsig_sym foo
Enter fullscreen mode Exit fullscreen mode

macos:

    .section    __TEXT,__text,regular,pure_instructions
    .build_version macos, 14, 0 sdk_version 14, 4
    .globl  _square                         ; -- Begin function square
    .p2align    2
_square:                                ; @square
    .cfi_startproc
; %bb.0:
    sub sp, sp, #16
    .cfi_def_cfa_offset 16
    str w0, [sp, #12]
    ldr w8, [sp, #12]
    ldr w9, [sp, #12]
    mul w0, w8, w9
    add sp, sp, #16
    ret
    .cfi_endproc
                                        ; -- End function
    .globl  _main                           ; -- Begin function main
    .p2align    2
_main:                                  ; @main
    .cfi_startproc
; %bb.0:
    sub sp, sp, #32
    stp x29, x30, [sp, #16]             ; 16-byte Folded Spill
    add x29, sp, #16
    .cfi_def_cfa w29, 16
    .cfi_offset w30, -8
    .cfi_offset w29, -16
    stur    wzr, [x29, #-4]
    mov w8, #5                          ; =0x5
    str w8, [sp, #8]
    bl  _foo
    str w0, [sp, #4]
    ldr w8, [sp, #4]
    str w8, [sp]                        ; 4-byte Folded Spill
    ldr w0, [sp, #8]
    bl  _square
    ldr w8, [sp]                        ; 4-byte Folded Reload
    add w0, w8, w0
    ldp x29, x30, [sp, #16]             ; 16-byte Folded Reload
    add sp, sp, #32
    ret
    .cfi_endproc
                                        ; -- End function
.subsections_via_symbols

Enter fullscreen mode Exit fullscreen mode

That's it for this post! In the next one, we'll cover the Assembler phase.

Top comments (0)