Following up with my previous post about Concurrency in Go, many developers encounter a common misconception: that Go provides true parallelism by default. While Go excels at concurrency with its lightweight goroutines, achieving effective parallel execution requires understanding Go's runtime scheduler and how GOMAXPROCS
affects program behavior. This post explores how Go handles concurrency, when it achieves parallelism, and how it compares to other programming languages.
Concurrency vs. Parallelism: Fundamental Distinctions
These terms are often conflated but represent distinct concepts:
- Concurrency: The ability to structure a program to handle multiple tasks, potentially overlapping in time. It's about program design and decomposition of problems into independently executable units.
- Parallelism: The simultaneous execution of multiple computations, typically on separate CPU cores. It's about execution and hardware utilization.
As Rob Pike famously said: "Concurrency is about dealing with lots of things at once. Parallelism is about doing lots of things at once."
Go's goroutines and channels provide elegant concurrency primitives, but actual parallel execution depends on runtime configuration and hardware capabilities.
Go's Concurrency Model: Goroutines and the Runtime Scheduler
Go implements a CSP-based (Communicating Sequential Processes) concurrency model using goroutines—lightweight, user-space threads managed by Go's runtime rather than the operating system. Compared to OS threads which might require megabytes of stack space, goroutines start with only 2KB, allowing programs to spawn millions of them efficiently.
Here's a simple example demonstrating goroutine creation:
package main
import (
"fmt"
"time"
)
func sayHello() {
fmt.Println("Hello from a goroutine!")
}
func main() {
go sayHello() // Launch goroutine
time.Sleep(100 * time.Millisecond) // Give goroutine time to execute
fmt.Println("Main function continues execution")
}
While this code runs the sayHello()
function concurrently with the main function, it doesn't necessarily execute in parallel. Understanding why requires examining Go's scheduler architecture.
Go's Scheduler: The M:P:G Model
Go's scheduler implements what's known as the M:P:G model:
- G (Goroutines): Application-level tasks
- M (Machine): OS threads that execute code
- P (Processor): Logical processors that manage execution contexts
In this model:
- Each P maintains a local queue of runnable goroutines
- Ms (OS threads) execute goroutines from the P they're assigned to
- When a P's queue is empty, it attempts to steal work from other Ps
This sophisticated work-stealing scheduler efficiently distributes goroutines across available system resources, but the number of Ps is the key limiting factor for parallel execution.
Controlling Parallelism with GOMAXPROCS
By default, Go sets GOMAXPROCS
equal to the number of available CPU cores since Go 1.5. This value determines the number of Ps (logical processors) in the runtime.
You can explicitly control this setting in your code:
package main
import (
"fmt"
"runtime"
"time"
)
func cpuIntensiveTask(id int) {
fmt.Printf("Task %d starting on CPU %d\n", id, runtime.NumCPU())
// Simulate CPU-intensive work
for i := 0; i < 1e9; i++ {}
fmt.Printf("Task %d completed\n", id)
}
func main() {
numCPU := runtime.NumCPU()
fmt.Printf("System has %d CPU cores\n", numCPU)
runtime.GOMAXPROCS(numCPU) // Explicitly set to use all cores
fmt.Printf("GOMAXPROCS set to %d\n", runtime.GOMAXPROCS(0))
start := time.Now()
// Launch CPU-intensive goroutines
for i := 0; i < numCPU; i++ {
go cpuIntensiveTask(i)
}
// Wait for goroutines to complete (in production, use sync.WaitGroup)
time.Sleep(5 * time.Second)
fmt.Printf("Execution time: %v\n", time.Since(start))
}
When GOMAXPROCS > 1
and your system has multiple cores, Go can truly execute goroutines in parallel. However, several factors can still limit actual parallel performance.
Go's Parallelism: Capabilities and Limitations
Go can achieve true parallelism, but with important caveats:
Strengths:
- Automatic Scaling: Go automatically distributes work across cores
- Low Overhead: Goroutines and channel communication have minimal overhead
- Work Stealing: Efficient distribution of tasks to prevent cores from idling
Limitations:
- Cooperative Scheduling: Goroutines yield control only at specific points (function calls, channel operations, etc.)
- Stop-the-World GC: Garbage collection pauses can temporarily halt all execution
- Scheduler Overhead: The work-stealing algorithm adds some overhead
- Network-Bound Performance: For I/O-heavy workloads, adding cores may not improve throughput
Benchmarking Parallelism in Go
To evaluate parallelism gains, you can use the testing
package with the -cpu
flag:
// parallelism_test.go
package main
import (
"runtime"
"testing"
)
func BenchmarkComputation(b *testing.B) {
for i := 0; i < b.N; i++ {
// CPU-intensive computation
result := 0
for j := 0; j < 10000000; j++ {
result += j
}
}
}
Run with: go test -bench=. -cpu=1,2,4,8
Comparative Analysis: Parallelism Across Languages
Language | Parallelism Model | Strengths | Limitations |
---|---|---|---|
Go | M:P:G scheduler with goroutines | Easy concurrency, low overhead, work stealing | Cooperative scheduling, GC pauses |
Rust | OS threads + async/await | Zero-cost abstractions, fine-grained control | Steeper learning curve, manual synchronization |
C++ (std::thread) | Direct OS thread mapping | Maximum performance, precise control | High thread creation overhead, manual resource management |
Java | Thread pools, ForkJoinPool | Rich ecosystem, mature tooling | Higher memory overhead, complex thread management |
Python | GIL in CPython, multiprocessing | Simple API, good for I/O | GIL prevents true threading parallelism |
Node.js | Event loop + worker threads | Excellent for I/O, non-blocking | Single-threaded main loop, callback complexity |
Optimizing Go for Parallel Workloads
For CPU-bound tasks requiring maximum parallelism:
-
Profile First: Use
go tool pprof
to identify bottlenecks - Tune GOMAXPROCS: Sometimes setting lower than available cores improves performance
- Optimize Work Distribution: Divide work into equally sized chunks
- Minimize Contention: Reduce lock contention and shared memory access
- Consider sync.Pool: Reduce GC pressure for frequently allocated objects
-
Use Performance-Oriented Packages: Consider
github.com/valyala/fasthttp
overnet/http
for web servers
Example of balanced work distribution:
package main
import (
"fmt"
"runtime"
"sync"
)
func processRange(start, end int, wg *sync.WaitGroup) {
defer wg.Done()
// Process the assigned range
sum := 0
for i := start; i < end; i++ {
sum += i
}
fmt.Printf("Range %d-%d sum: %d\n", start, end, sum)
}
func main() {
const totalWork = 1000000
numCPU := runtime.NumCPU()
runtime.GOMAXPROCS(numCPU)
var wg sync.WaitGroup
chunkSize := totalWork / numCPU
for i := 0; i < numCPU; i++ {
start := i * chunkSize
end := start + chunkSize
if i == numCPU-1 {
end = totalWork // Handle any remainder in the last chunk
}
wg.Add(1)
go processRange(start, end, &wg)
}
wg.Wait()
fmt.Println("All work completed")
}
When to Emphasize Parallelism in Go
Go's design philosophy prioritizes simplicity and maintainability over raw CPU performance. Consider these factors when deciding how much to invest in parallel optimizations:
- I/O-Bound vs. CPU-Bound: For I/O-bound applications, Go's concurrency model already provides excellent throughput without explicit parallelism tuning
- Development Time vs. Runtime: Optimize only when performance requirements demand it
- Scalability Requirements: Consider future workload growth patterns
- Resource Constraints: Memory limitations may favor alternative approaches
Go provides sophisticated concurrency primitives that make parallel programming more accessible than many other languages. While Go can achieve true parallelism, understanding the nuances of its scheduler, the GOMAXPROCS
setting, and inherent limitations helps developers make informed architectural decisions.
For most applications, Go's default configuration provides an excellent balance of throughput and simplicity. When optimization is necessary, profiling and benchmark-driven tuning yields the best results.
Whether you're building high-performance web services, data processing pipelines, or distributed systems, Go's approach to concurrency and parallelism offers a compelling foundation for modern software development.
What has been your experience with parallelism in Go? Share in the comments below!
Top comments (0)