Ahmed A. Elkhalifa

Posted on Mar 16, 2023

x64 Assembly: Multithreading from Scratch Part 2: Threads

#tutorial #multithreading #assembly

In Part 1 of this series we have looked into how to use fork system call to spawn a child process and branch out on the child process to do a different code path. In this part we will look into how we can implement a more functional way to do multithreading in assembly.

Goal

We want to use any function/subroutine in our assembly code as a thread, meaning that we want a way to call a function and this function will run on a different thread, then we need a way to wait (on the parent process) for the threads we create.

Implementation

We will create a file lib/thread.asm that will have two functions:

thread_run: It will take a pointer to the code we want to execute (stored in rax) and then it will pass along the registers rdi, rsi, rdx and r8 unchanged to the function now running on the child thread it will create. This function will call fork and then on the child thread it will jump to the code pointed at by rax, on the parent it will return the PID of the child process in rax.
thread_wait: This function takes a single argument in rax which is a PID of a child process, it will wait for that child process to finish executing then it returns the exit code of that child process in rax.

`thread_run`

# lib/thread.asm

.intel_syntax
.section .text

# thread_run
# A function to start running the given code on a child process
#
# Arguments:
#   rax: Pointer to function(label)
#   rdi, rsi, rdx, rcx, r8 : The arguments to the function
#
# Returns:
#   rax: Child process id
#
.global thread_run
thread_run:
    push %rax # Push rax value to the stack

    # Fork this process
    mov %rax, 0x39
    syscall

    cmp %rax, 0x00 # Compare the returned PID
    mov %r10, %rax # Store the PID in r10 temporarily
    pop %rax # Get back rax value (Function pointer)
    je _invoke # Jump if this is the child process

    # If this is the parent process then return 
    mov %rax, %r10 # Restore the PID
    ret

_invoke:
    # Jump to the pointer of the function
    call %rax

    # Safe exit
    mov %rax, 0x3c
    mov %rdi, 0x00
    syscall

In the above code, the function starts by storing (pushing) the pointer that is in rax to the stack because we need the rax register to make a fork system call, then we compare the return of fork to zero and store the PID (the current value of rax) in r10 register temporarly, and put back the function pointer in rax register and jump conditionally to the _invoke function/subroutine if the value of the earliy comparison is correct, other-wise we restore the PID in rax and return. _invoke subroutine will call the function in the address stored in rax and then exit safetly with the exit syscall.

`thread_wait`

# lib/thread.asm

...

# thread_wait
# Wait for a process to exit
#
# Arguments: 
#   rax: Process PID
#
# Returns:
#   rax: Exit code
#
.global thread_wait
thread_wait:
    # Allocate space in the stack to store the exit code
    sub %rsp, 4

    mov %rdi, %rax # Move the PID to %rdi for wait4 syscall
    mov %rax, 0x3d # wait4 syscall
    mov %rsi, %rsp # Pointer to the exit code space
    mov %rdx, 0x00 # Flags
    mov %r10, 0x00 # NULL pointer
    syscall

    # Store the exit code in EAX (4 byte value of rax)
    mov %eax, [%rsp]
    add %rsp, 4

    ret

In the above code we start by allocating 4 bytes of memory on the stack, then we call the wait4 system call, we basically give it the PID of the child process in rdi and a pointer to 4 bytes in memory to store the exit value of the child process that exited (We give it the 4 bytes we just allocated on the stack). After wait4 syscall returns, we get the return value from the stack and deallocate that memory from the stack.

NOTE: The last step could be achived using pop DWORD %rax but for some reason using gas it didn't want to assemble, but I assume it will work fine on MASM or another assembler.

Assembly

Now we will assemble this code the same way we did with util.asm using the following commad:

$ as lib/thread.asm -o thread.o

If eveything works fine we will now have a thread.o file that we will link our code with.

Driver Code

To demonstrate how these two functions work, we will create a new directory hello_threads in the root of our project and put a main.asm file inside of it.

In this file we will write a program that will execute a thread to run the print function we wrote in util.asm last time and wait for the thread to finish executing, and then print from the main (parent) process.
Then the program will execute another thread with the print function and continue running the main (parent) process without waiting.

# hello_threads/main.asm

.global _start
.intel_syntax
.section .text

# The external functions we need
.extern thread_run
.extern thread_wait
.extern print

_start:
    # Create a thread
    call _log
    lea %rax, [%rip + print]
    lea %rdi, [%rip + thread_msg]
    mov %rsi, OFFSET thread_msg_len
    call thread_run

    # Wait for the process (PID is in rax) then print
    call thread_wait
    call _pmsg

    # Create a thread
    call _log
    lea %rax, [%rip + print]
    lea %rdi, [%rip + thread2_msg]
    mov %rsi, OFFSET thread2_msg_len
    call thread_run

    # Print without waiting
    call _pmsg

    # Exit
    jmp _exit


_pmsg:
    # Print 'parent_msg'
    lea %rdi, [%rip + parent_msg]
    mov %rsi, OFFSET parent_msg_len
    call print
    ret

_log:
    # Print 'log_msg'
    lea %rdi, [%rip + log_msg]
    mov %rsi, OFFSET log_msg_len
    call print
    ret

_exit:
    # Exit safetly with status code 0
    mov %rax, 0x3c
    mov %rdi, 0x00
    syscall

.section .data
    log_msg:
        .ascii "\nCreating a thread..\n" # 20
        log_msg_len = . - log_msg
    thread_msg:
        .ascii "Hello from thread 1!\n" #19
        thread_msg_len = . - thread_msg
    thread2_msg:
        .ascii "Hello from thread 2!\n" #19
        thread2_msg_len = . - thread2_msg
    parent_msg:
        .ascii "Hello from parent!!\n" #20
        parent_msg_len = . - parent_msg

Now we assemble our code and link the resulting object file with both lib/util.o and lib/thread.o with the following commads:

$ as hello_threads/main.asm -o hello_threads/hello_threads.o
$ ld hello_threads/hello_threads.o lib/util.o lib/thread.o -o hello_threads/hello_threads.elf -nostdlib

If everything goes well, we should now have a hello_threads.elf in hello_threads directory, if we execute the file we should see an output similar to this:

Creating a thread..
Hello from thread 1!
Hello from parent!!

Creating a thread..
Hello from parent!!
Hello from thread 2!

Notice how the first thread executes before the main (parent) thread due to us using the thread_wait function that we defined earlier. But in the second theard case it is different, the parent executes first -for most of the time- since it does not have to wait for the thread to finish executing.

Now that we have a functional way to spwan and wait for threads we will need a way for threads to communicate and share data with each other, so in the next part we will look into how we can allocate and share memory between threads.

The code is available on the Github repository if you want to check it out.

Top comments (1)

Rain • Feb 14 '24

Isn't this more multiprocessing than it is multithreading?

DEV Community

x64 Assembly: Multithreading from Scratch Part 2: Threads

Goal

Implementation

`thread_run`

`thread_wait`

Assembly

Driver Code

Top comments (1)

Read next

AI-Powered Web Dev: Build a Full Stack App with Just a Few Prompts Using Supabase & Lovable

Why Spaces Are Encoded: %20 with encodeURI and +(plus) with URL / Differences Between encodeURI and URL

Migrating Vector Data from Milvus to TiDB

Implementing WebSocket Communication and Heartbeat Mechanism with GoFrame: A Hands-on Guide