DEV Community

Wilbur Suero
Wilbur Suero

Posted on

Concurrency vs. Parallelism in Go: What Every Developer Should Know

#go

Following up with my previous post about Concurrency in Go, many developers encounter a common misconception: that Go provides true parallelism by default. While Go excels at concurrency with its lightweight goroutines, achieving effective parallel execution requires understanding Go's runtime scheduler and how GOMAXPROCS affects program behavior. This post explores how Go handles concurrency, when it achieves parallelism, and how it compares to other programming languages.

Concurrency vs. Parallelism: Fundamental Distinctions

These terms are often conflated but represent distinct concepts:

  • Concurrency: The ability to structure a program to handle multiple tasks, potentially overlapping in time. It's about program design and decomposition of problems into independently executable units.
  • Parallelism: The simultaneous execution of multiple computations, typically on separate CPU cores. It's about execution and hardware utilization.

As Rob Pike famously said: "Concurrency is about dealing with lots of things at once. Parallelism is about doing lots of things at once."

Go's goroutines and channels provide elegant concurrency primitives, but actual parallel execution depends on runtime configuration and hardware capabilities.

Go's Concurrency Model: Goroutines and the Runtime Scheduler

Go implements a CSP-based (Communicating Sequential Processes) concurrency model using goroutines—lightweight, user-space threads managed by Go's runtime rather than the operating system. Compared to OS threads which might require megabytes of stack space, goroutines start with only 2KB, allowing programs to spawn millions of them efficiently.

Here's a simple example demonstrating goroutine creation:

package main

import (
    "fmt"
    "time"
)

func sayHello() {
    fmt.Println("Hello from a goroutine!")
}

func main() {
    go sayHello() // Launch goroutine
    time.Sleep(100 * time.Millisecond) // Give goroutine time to execute
    fmt.Println("Main function continues execution")
}
Enter fullscreen mode Exit fullscreen mode

While this code runs the sayHello() function concurrently with the main function, it doesn't necessarily execute in parallel. Understanding why requires examining Go's scheduler architecture.

Go's Scheduler: The M:P:G Model

Go's scheduler implements what's known as the M:P:G model:

  • G (Goroutines): Application-level tasks
  • M (Machine): OS threads that execute code
  • P (Processor): Logical processors that manage execution contexts

In this model:

  1. Each P maintains a local queue of runnable goroutines
  2. Ms (OS threads) execute goroutines from the P they're assigned to
  3. When a P's queue is empty, it attempts to steal work from other Ps

This sophisticated work-stealing scheduler efficiently distributes goroutines across available system resources, but the number of Ps is the key limiting factor for parallel execution.

Controlling Parallelism with GOMAXPROCS

By default, Go sets GOMAXPROCS equal to the number of available CPU cores since Go 1.5. This value determines the number of Ps (logical processors) in the runtime.

You can explicitly control this setting in your code:

package main

import (
    "fmt"
    "runtime"
    "time"
)

func cpuIntensiveTask(id int) {
    fmt.Printf("Task %d starting on CPU %d\n", id, runtime.NumCPU())
    // Simulate CPU-intensive work
    for i := 0; i < 1e9; i++ {}
    fmt.Printf("Task %d completed\n", id)
}

func main() {
    numCPU := runtime.NumCPU()
    fmt.Printf("System has %d CPU cores\n", numCPU)

    runtime.GOMAXPROCS(numCPU) // Explicitly set to use all cores
    fmt.Printf("GOMAXPROCS set to %d\n", runtime.GOMAXPROCS(0))

    start := time.Now()

    // Launch CPU-intensive goroutines
    for i := 0; i < numCPU; i++ {
        go cpuIntensiveTask(i)
    }

    // Wait for goroutines to complete (in production, use sync.WaitGroup)
    time.Sleep(5 * time.Second)

    fmt.Printf("Execution time: %v\n", time.Since(start))
}
Enter fullscreen mode Exit fullscreen mode

When GOMAXPROCS > 1 and your system has multiple cores, Go can truly execute goroutines in parallel. However, several factors can still limit actual parallel performance.

Go's Parallelism: Capabilities and Limitations

Go can achieve true parallelism, but with important caveats:

Strengths:

  • Automatic Scaling: Go automatically distributes work across cores
  • Low Overhead: Goroutines and channel communication have minimal overhead
  • Work Stealing: Efficient distribution of tasks to prevent cores from idling

Limitations:

  • Cooperative Scheduling: Goroutines yield control only at specific points (function calls, channel operations, etc.)
  • Stop-the-World GC: Garbage collection pauses can temporarily halt all execution
  • Scheduler Overhead: The work-stealing algorithm adds some overhead
  • Network-Bound Performance: For I/O-heavy workloads, adding cores may not improve throughput

Benchmarking Parallelism in Go

To evaluate parallelism gains, you can use the testing package with the -cpu flag:

// parallelism_test.go
package main

import (
    "runtime"
    "testing"
)

func BenchmarkComputation(b *testing.B) {
    for i := 0; i < b.N; i++ {
        // CPU-intensive computation
        result := 0
        for j := 0; j < 10000000; j++ {
            result += j
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Run with: go test -bench=. -cpu=1,2,4,8

Comparative Analysis: Parallelism Across Languages

Language Parallelism Model Strengths Limitations
Go M:P:G scheduler with goroutines Easy concurrency, low overhead, work stealing Cooperative scheduling, GC pauses
Rust OS threads + async/await Zero-cost abstractions, fine-grained control Steeper learning curve, manual synchronization
C++ (std::thread) Direct OS thread mapping Maximum performance, precise control High thread creation overhead, manual resource management
Java Thread pools, ForkJoinPool Rich ecosystem, mature tooling Higher memory overhead, complex thread management
Python GIL in CPython, multiprocessing Simple API, good for I/O GIL prevents true threading parallelism
Node.js Event loop + worker threads Excellent for I/O, non-blocking Single-threaded main loop, callback complexity

Optimizing Go for Parallel Workloads

For CPU-bound tasks requiring maximum parallelism:

  1. Profile First: Use go tool pprof to identify bottlenecks
  2. Tune GOMAXPROCS: Sometimes setting lower than available cores improves performance
  3. Optimize Work Distribution: Divide work into equally sized chunks
  4. Minimize Contention: Reduce lock contention and shared memory access
  5. Consider sync.Pool: Reduce GC pressure for frequently allocated objects
  6. Use Performance-Oriented Packages: Consider github.com/valyala/fasthttp over net/http for web servers

Example of balanced work distribution:

package main

import (
    "fmt"
    "runtime"
    "sync"
)

func processRange(start, end int, wg *sync.WaitGroup) {
    defer wg.Done()
    // Process the assigned range
    sum := 0
    for i := start; i < end; i++ {
        sum += i
    }
    fmt.Printf("Range %d-%d sum: %d\n", start, end, sum)
}

func main() {
    const totalWork = 1000000
    numCPU := runtime.NumCPU()
    runtime.GOMAXPROCS(numCPU)

    var wg sync.WaitGroup
    chunkSize := totalWork / numCPU

    for i := 0; i < numCPU; i++ {
        start := i * chunkSize
        end := start + chunkSize
        if i == numCPU-1 {
            end = totalWork // Handle any remainder in the last chunk
        }

        wg.Add(1)
        go processRange(start, end, &wg)
    }

    wg.Wait()
    fmt.Println("All work completed")
}
Enter fullscreen mode Exit fullscreen mode

When to Emphasize Parallelism in Go

Go's design philosophy prioritizes simplicity and maintainability over raw CPU performance. Consider these factors when deciding how much to invest in parallel optimizations:

  • I/O-Bound vs. CPU-Bound: For I/O-bound applications, Go's concurrency model already provides excellent throughput without explicit parallelism tuning
  • Development Time vs. Runtime: Optimize only when performance requirements demand it
  • Scalability Requirements: Consider future workload growth patterns
  • Resource Constraints: Memory limitations may favor alternative approaches

Go provides sophisticated concurrency primitives that make parallel programming more accessible than many other languages. While Go can achieve true parallelism, understanding the nuances of its scheduler, the GOMAXPROCS setting, and inherent limitations helps developers make informed architectural decisions.

For most applications, Go's default configuration provides an excellent balance of throughput and simplicity. When optimization is necessary, profiling and benchmark-driven tuning yields the best results.

Whether you're building high-performance web services, data processing pipelines, or distributed systems, Go's approach to concurrency and parallelism offers a compelling foundation for modern software development.

What has been your experience with parallelism in Go? Share in the comments below!

Top comments (0)