Wilbur Suero

Posted on Mar 11

Concurrency vs. Parallelism in Go: What Every Developer Should Know

#go

Following up with my previous post about Concurrency in Go, many developers encounter a common misconception: that Go provides true parallelism by default. While Go excels at concurrency with its lightweight goroutines, achieving effective parallel execution requires understanding Go's runtime scheduler and how GOMAXPROCS affects program behavior. This post explores how Go handles concurrency, when it achieves parallelism, and how it compares to other programming languages.

Concurrency vs. Parallelism: Fundamental Distinctions

These terms are often conflated but represent distinct concepts:

Concurrency: The ability to structure a program to handle multiple tasks, potentially overlapping in time. It's about program design and decomposition of problems into independently executable units.
Parallelism: The simultaneous execution of multiple computations, typically on separate CPU cores. It's about execution and hardware utilization.

As Rob Pike famously said: "Concurrency is about dealing with lots of things at once. Parallelism is about doing lots of things at once."

Go's goroutines and channels provide elegant concurrency primitives, but actual parallel execution depends on runtime configuration and hardware capabilities.

Go's Concurrency Model: Goroutines and the Runtime Scheduler

Go implements a CSP-based (Communicating Sequential Processes) concurrency model using goroutines—lightweight, user-space threads managed by Go's runtime rather than the operating system. Compared to OS threads which might require megabytes of stack space, goroutines start with only 2KB, allowing programs to spawn millions of them efficiently.

Here's a simple example demonstrating goroutine creation:

package main

import (
    "fmt"
    "time"
)

func sayHello() {
    fmt.Println("Hello from a goroutine!")
}

func main() {
    go sayHello() // Launch goroutine
    time.Sleep(100 * time.Millisecond) // Give goroutine time to execute
    fmt.Println("Main function continues execution")
}

While this code runs the sayHello() function concurrently with the main function, it doesn't necessarily execute in parallel. Understanding why requires examining Go's scheduler architecture.

Go's Scheduler: The M:P:G Model

Go's scheduler implements what's known as the M:P:G model:

G (Goroutines): Application-level tasks
M (Machine): OS threads that execute code
P (Processor): Logical processors that manage execution contexts

In this model:

Each P maintains a local queue of runnable goroutines
Ms (OS threads) execute goroutines from the P they're assigned to
When a P's queue is empty, it attempts to steal work from other Ps

This sophisticated work-stealing scheduler efficiently distributes goroutines across available system resources, but the number of Ps is the key limiting factor for parallel execution.

Controlling Parallelism with GOMAXPROCS

By default, Go sets GOMAXPROCS equal to the number of available CPU cores since Go 1.5. This value determines the number of Ps (logical processors) in the runtime.

You can explicitly control this setting in your code:

package main

import (
    "fmt"
    "runtime"
    "time"
)

func cpuIntensiveTask(id int) {
    fmt.Printf("Task %d starting on CPU %d\n", id, runtime.NumCPU())
    // Simulate CPU-intensive work
    for i := 0; i < 1e9; i++ {}
    fmt.Printf("Task %d completed\n", id)
}

func main() {
    numCPU := runtime.NumCPU()
    fmt.Printf("System has %d CPU cores\n", numCPU)

    runtime.GOMAXPROCS(numCPU) // Explicitly set to use all cores
    fmt.Printf("GOMAXPROCS set to %d\n", runtime.GOMAXPROCS(0))

    start := time.Now()

    // Launch CPU-intensive goroutines
    for i := 0; i < numCPU; i++ {
        go cpuIntensiveTask(i)
    }

    // Wait for goroutines to complete (in production, use sync.WaitGroup)
    time.Sleep(5 * time.Second)

    fmt.Printf("Execution time: %v\n", time.Since(start))
}

When GOMAXPROCS > 1 and your system has multiple cores, Go can truly execute goroutines in parallel. However, several factors can still limit actual parallel performance.

Go's Parallelism: Capabilities and Limitations

Go can achieve true parallelism, but with important caveats:

Strengths:

Automatic Scaling: Go automatically distributes work across cores
Low Overhead: Goroutines and channel communication have minimal overhead
Work Stealing: Efficient distribution of tasks to prevent cores from idling

Limitations:

Cooperative Scheduling: Goroutines yield control only at specific points (function calls, channel operations, etc.)
Stop-the-World GC: Garbage collection pauses can temporarily halt all execution
Scheduler Overhead: The work-stealing algorithm adds some overhead
Network-Bound Performance: For I/O-heavy workloads, adding cores may not improve throughput

Benchmarking Parallelism in Go

To evaluate parallelism gains, you can use the testing package with the -cpu flag:

// parallelism_test.go
package main

import (
    "runtime"
    "testing"
)

func BenchmarkComputation(b *testing.B) {
    for i := 0; i < b.N; i++ {
        // CPU-intensive computation
        result := 0
        for j := 0; j < 10000000; j++ {
            result += j
        }
    }
}

Run with: go test -bench=. -cpu=1,2,4,8

Comparative Analysis: Parallelism Across Languages

Language	Parallelism Model	Strengths	Limitations
Go	M:P:G scheduler with goroutines	Easy concurrency, low overhead, work stealing	Cooperative scheduling, GC pauses
Rust	OS threads + async/await	Zero-cost abstractions, fine-grained control	Steeper learning curve, manual synchronization
C++ (std::thread)	Direct OS thread mapping	Maximum performance, precise control	High thread creation overhead, manual resource management
Java	Thread pools, ForkJoinPool	Rich ecosystem, mature tooling	Higher memory overhead, complex thread management
Python	GIL in CPython, multiprocessing	Simple API, good for I/O	GIL prevents true threading parallelism
Node.js	Event loop + worker threads	Excellent for I/O, non-blocking	Single-threaded main loop, callback complexity

Optimizing Go for Parallel Workloads

For CPU-bound tasks requiring maximum parallelism:

Profile First: Use go tool pprof to identify bottlenecks
Tune GOMAXPROCS: Sometimes setting lower than available cores improves performance
Optimize Work Distribution: Divide work into equally sized chunks
Minimize Contention: Reduce lock contention and shared memory access
Consider sync.Pool: Reduce GC pressure for frequently allocated objects
Use Performance-Oriented Packages: Consider github.com/valyala/fasthttp over net/http for web servers

Example of balanced work distribution:

package main

import (
    "fmt"
    "runtime"
    "sync"
)

func processRange(start, end int, wg *sync.WaitGroup) {
    defer wg.Done()
    // Process the assigned range
    sum := 0
    for i := start; i < end; i++ {
        sum += i
    }
    fmt.Printf("Range %d-%d sum: %d\n", start, end, sum)
}

func main() {
    const totalWork = 1000000
    numCPU := runtime.NumCPU()
    runtime.GOMAXPROCS(numCPU)

    var wg sync.WaitGroup
    chunkSize := totalWork / numCPU

    for i := 0; i < numCPU; i++ {
        start := i * chunkSize
        end := start + chunkSize
        if i == numCPU-1 {
            end = totalWork // Handle any remainder in the last chunk
        }

        wg.Add(1)
        go processRange(start, end, &wg)
    }

    wg.Wait()
    fmt.Println("All work completed")
}

When to Emphasize Parallelism in Go

Go's design philosophy prioritizes simplicity and maintainability over raw CPU performance. Consider these factors when deciding how much to invest in parallel optimizations:

I/O-Bound vs. CPU-Bound: For I/O-bound applications, Go's concurrency model already provides excellent throughput without explicit parallelism tuning
Development Time vs. Runtime: Optimize only when performance requirements demand it
Scalability Requirements: Consider future workload growth patterns
Resource Constraints: Memory limitations may favor alternative approaches

Go provides sophisticated concurrency primitives that make parallel programming more accessible than many other languages. While Go can achieve true parallelism, understanding the nuances of its scheduler, the GOMAXPROCS setting, and inherent limitations helps developers make informed architectural decisions.

For most applications, Go's default configuration provides an excellent balance of throughput and simplicity. When optimization is necessary, profiling and benchmark-driven tuning yields the best results.

Whether you're building high-performance web services, data processing pipelines, or distributed systems, Go's approach to concurrency and parallelism offers a compelling foundation for modern software development.

What has been your experience with parallelism in Go? Share in the comments below!

DEV Community

Concurrency vs. Parallelism in Go: What Every Developer Should Know

Concurrency vs. Parallelism: Fundamental Distinctions

Go's Concurrency Model: Goroutines and the Runtime Scheduler

Go's Scheduler: The M:P:G Model

Controlling Parallelism with GOMAXPROCS

Go's Parallelism: Capabilities and Limitations

Strengths:

Limitations:

Benchmarking Parallelism in Go

Comparative Analysis: Parallelism Across Languages

Optimizing Go for Parallel Workloads

When to Emphasize Parallelism in Go

Top comments (0)

Read next

Understanding Request ID: Why It's Essential for Modern APIs

Mastering Go Compiler Optimization for Better Performance

Golang Vs. Python Performance: Which Programming Language Is Better?

Why Protobuf Should Dominate the Data Format Ecosystem