Memory optimization is crucial for writing performant software systems. When a software has a finite amount of memory to work with, many issues can arise when that memory isn't used efficiently. That's why memory optimization is critical for better overall performance.
Go inherits many of C's advantageous features, but what I notice is that a large part of people who use it do not know the full power of this language. One of the reasons may be a lack of knowledge about how it works at a low level, or a lack of experience with languages like C or C++. I mention C and C++ because the foundations of Go are pretty much built on the wonderful features of C/C++. It's no coincidence that I will quote an interview of Ken Thompson at Google I/O 2012:
For me, the reason I was enthusiastic about Go is because just about the same time that we were starting on Go, I read (or tried to read) the C++0x proposed standard, and that was a convincer for me.
Today, we're going to talk about how we can optimize our Go program, and more specifically, how it would be good to use structs in Go. Let's first say what a structure is:
A struct is a user-defined data type that groups related variables of different types under a single name.
To fully understand where the problem lies, we will mention that modern processors do not read 1 byte at a time from the memory. How the CPU fetches the data or instructions which are stored in the memory?
In computer architecture, a word is a unit of data that a processor can handle in a single operation - generally the smallest addressable unit of memory. It's a fixed-size group of bits (binary digits). The word size of a processor determines its ability to handle data efficiently. Common word sizes include 8, 16, 32, and 64 bits. Some computer processor architectures support a half word, which is half the number of bits in a word, and a double word, which is two contiguous words.
In nowday the most common architectures is 32 bit and 64 bit. If you have 32 bit processor then it means it can access 4 bytes at a time which means word size is 4 bytes. If you have 64 bit processor the it can access 8 bytes at time which means word size is 8 bytes.
When we store the data in memory, each 32-bit data word has a unique address, as shown below.
Figure. 1 ‑ Word-Addressable Memory
We can read the data in memory and load it to one register using the load word (lw) instruction.
After know the above theoria let's see what is the practise. For the descripte the cases with structure data structure, I will demonstrate with C languge. A struct in C is a composite data type that allows you to group multiple variables together and store them in the same block of memory. As we said early the CPU access the data depend of the givemn architecture. Every data type in C will have alignment requirements.
So let's we have the following as simple structure:
// structure 1
typedef struct example_1 {
char c;
short int s;
} struct1_t;
// structure 2
typedef struct example_2 {
double d;
int s;
char c;
} struct2_t;
And now try to calculate the size of the following structures:
Size of Structure 1 = Size of (char + short int) = 1 + 2 = 3.
Size of Structure 2 = Size of (double + int + char) = 8 + 4 + 1= 13.
The real sizes using a C program might surprise you.
#include <stdio.h>
// structure 1
typedef struct example_1 {
char c;
short int s;
} struct1_t;
// structure 2
typedef struct example_2 {
double d;
int s;
char c;
} struct2_t;
int main()
{
printf("sizeof(struct1_t) = %lu\n", sizeof(struct1_t));
printf("sizeof(struct2_t) = %lu\n", sizeof(struct2_t));
return 0;
}
Output
sizeof(struct1_t) = 4
sizeof(struct2_t) = 16
As we can see, the size of the structures is different from those we calculated.
What is the reason for that?
C and Go employ a technique known as "struct padding" to ensure data is appropriately aligned in memory, which can significantly affect performance due to hardware and architectural constraints. Data padding and alignment conform to the requirements of the system's architecture, mainly to optimize CPU access times by ensuring data boundaries align with word sizes.
Let's go through an example to illustrate how Go handles padding and alignment, consider the following struct:
type Employee struct {
IsAdmin bool
Id int64
Age int32
Salary float32
}
A bool is 1 byte, int64
is 8 bytes, int32
is 4 bytes and float32
is 4 bytes = 17 bytes(total).
Let's validate the struct size by examining the compiled Go program:
package main
import (
"fmt"
"unsafe"
)
type Employee struct {
IsAdmin bool
Id int64
Age int32
Salary float32
}
func main() {
var emp Employee
fmt.Printf("Size of Employee: %d\n", unsafe.Sizeof(emp))
}
Output
Size of Employee: 24
The reported size is 24 bytes, not 17. This discrepancy is due to memory alignment. To understand how alignment works, we need to inspect the structure and visualize the memory which it ocupate.
Figure 2 - Unoptimized Memory Layout
The struct Employee will consume 8*3 = 24 bytes. Yo see the problem now, there are a lot of empty holes in the layout of Employee
(those gaps created by the alignment rules are called “padding”).
Padding Optimization and Performance Impact
Understanding how memory alignment and padding can affect the performance of an application is crucial. Specifically, data alignment impacts the number of CPU cycles required to access fields within a struct. This influence arises mostly from CPU cache effects, rather than raw clock cycles themselves, as cache behavior depends heavily on data locality and alignment within memory blocks.
Modern CPUs fetch data from memory into a faster intermediary called cache, organized in fixed-size blocks (commonly 64 bytes). When data is well-aligned and localized within the same or fewer cache lines, the CPU can access it more quickly due to reduced cache loading operations.
Consider the following Go structures to illustrate poor versus optimal alignment:
// Poorly aligned struct
type Misaligned struct {
Age uint8 // Uses 1 byte, followed by 7 bytes of padding to align the next field
PassportId uint64 // 8-byte aligned uint64 for the passport ID
Children uint16 //2-byte aligned uint16
// Well-aligned struct
type Aligned struct {
Age uint8 // Starting with 1 byte
Children uint16 // Next, 2 bytes; all these combine into a 3-byte sequence
PassportId uint64 // Finally, an 8-byte aligned uint64 without needing additional padding
}
How Alignment Affects Performance
CPU reads data in word-size instead of byte-size. As I described in the begining a word in a 64-bit system is 8 bytes, while a word in a 32-bit system is 4 bytes. In short, CPU reads address in the multiple of its word size. To fetch the variable passportId, our CPU takes two cycles to access the data instead of one. The first cycle will fetch memory 0 to 7 and the subsequent cycle will fetch the rest. And this is inefficient- we need of data structure alignment.By simply aligning the data, computers ensure that the var passportId
can be retrieved in ONE CPU cycle.
Figure 3 - Comparing Memory Access Efficiency
Padding is the key to achieving data alignment. Padding occurs because modern CPUs are optimized to read data from memory at aligned addresses. This alignment allows the CPU to read the data in a single operation.
Figure 4 - Simply aligning the data
Without padding, data may be misaligned, leading to multiple memory accesses and slower performance. Therefore, while padding might waste some memory, it ensures that your program runs efficiently.
Padding Optimization Strategies
Aligned struct consumes lesser memory simply because it possesses a better struct fields order compared to Misaligned. Because of padding, two 13 bytes data structures turn out to be 16 bytes and 24 bytes respectively. Hence, you save up extra memory by simply reordering your struct fields.
Figure 5 - Optimizing Field Order
Improperly aligned data can slow performance as the CPU might need multiple cycles to access misaligned fields. Conversely, correctly aligned data minimizes cache line loads, which is crucial for performance, especially in systems where memory speed is a bottleneck.
Let’s do a simple benchmark to prove it:
var AlignedArr []Aligned
var MisalignedArr []Misaligned
func init() {
const sampleSize = 1000
AlignedArr = make([]Aligned, sampleSize)
MisalignedArr = make([]Misaligned, sampleSize)
for i := 0; i < sampleSize; i++ {
AlignedArr[i] = Aligned{Age: uint8(i % 256), Siblings: uint16(i), Children: uint64(i)}
MisalignedArr[i] = Misaligned{Age: uint8(i % 256), PassportId: uint64(i), Children: uint16(i)}
}
}
func traverseAligned() uint16 {
var arbitraryNum uint16
for _, item := range AlignedArr {
arbitraryNum += item.Siblings
}
return arbitraryNum
}
func traverseMisaligned() uint16 {
var arbitraryNum uint16
for _, item := range MisalignedArr {
arbitraryNum += item.Children
}
return arbitraryNum
}
func BenchmarkTraverseAligned(b *testing.B) {
for n := 0; n < b.N; n++ {
traverseAligned()
}
}
func BenchmarkTraverseMisaligned(b *testing.B) {
for n := 0; n < b.N; n++ {
traverseMisaligned()
}
}
Output
go test -bench=.
goos: linux
goarch: amd64
pkg: test-project
cpu: 11th Gen Intel(R) Core(TM) i9-11950H @ 2.60GHz
BenchmarkTraverseAligned-16 3022234 403.7 ns/op
BenchmarkTraverseMisaligned-16 4300167 299.1 ns/op
PASS
ok test-project 3.195s
As you can see the traversing the Aligned indeed takes lesser time than its counterpart.
Padding’s added to make sure each struct field lines up properly in memory based on its needs, like we saw earlier. But while it enables efficient access, padding can also waste space if the fields ain’t ordered well.
Understanding how to properly align struct fields to minimize memory waste due to padding is important for efficient memory usage, especially in performance-critical applications. Below, I will provide an example with a poorly aligned structure and then show an optimized version of the same structure.
In a poorly aligned struct, fields are ordered without considering their sizes and alignment requirements, which can lead to added padding and increased memory usage:
// Badly aligned structure
type Person struct {
Active bool // 1 byte + 7 bytes padding
Salary float64 // 8 bytes
Age int32 // 4 bytes + 4 bytes padding
Nickname string // 16 bytes (string is typically 16 bytes on a 64-bit system)
}
The total memory could hence be 1 (bool
) + 7 (padding) + 8 (float64
) + 4 (int32
) + 4 (padding
) + 16 (string
) = 40 bytes.
An optimized structure arranges fields from largest to smallest size, significantly reducing or eliminating the need for additional padding:
// Well-aligned structure
type Person struct {
Salary float64 // 8 bytes
Nickname string // 16 bytes
Age int32 // 4 bytes
Active bool // 1 byte + 3 bytes padding
}
The total memory would then neatly comprise 8 (float64
) + 16 (string
) + 4 (int32
) + 1 (bool
) + 3 (padding
) = 32 bytes.
Let's proof the above:
package main
import (
"fmt"
"unsafe"
)
type PoorlyAlignedPerson struct {
Active bool
Salary float64
Age int32
Nickname string
}
type WellAlignedPerson struct {
Salary float64
Nickname string
Age int32
Active bool
}
func main() {
poorlyAligned := PoorlyAlignedPerson{}
wellAligned := WellAlignedPerson{}
fmt.Printf("Size of PoorlyAlignedPerson: %d bytes\n", unsafe.Sizeof(poorlyAligned))
fmt.Printf("Size of WellAlignedPerson: %d bytes\n", unsafe.Sizeof(wellAligned))
}
Output
Size of PoorlyAlignedPerson: 40 bytes
Size of WellAlignedPerson: 32 bytes
Reducing the structure size from 40 bytes to 32 bytes means a 20% reduction in memory usage per instance of Person. This can lead to considerable savings in applications where many such instances are created or stored, improving cache efficiency and potentially reducing the number of cache misses.
Conclusion
Data alignment is a pivotal factor in optimizing memory utilization and enhancing system performance. By arranging struct data correctly, memory usage becomes not only more efficient but also faster in terms of CPU read times, contributing significantly to overall system efficiency.
Top comments (2)
It seems like very simple issue to optimize for a compiler, isn't it?
Are we paying this price for a faster compiler or there are some edge case where this optimization could fire-back?
Also, does C or C++ also requires this optimization?
Like Go, both C and C++ compilers automatically add padding to satisfy alignment requirements.
In GCC you can use
__attribute__((packed))
:attribute
Advanced developers often manually optimize data structures, especially in performance-critical applications (like systems programming, game development, or real-time simulations).
And automatic padding can lead to increased memory usage, especially if struct fields are not ordered optimally. Developers might need to reorder struct fields manually to minimize padding and reduce the struct size.