DEV Community

Cover image for Under The Hood : Unassigned Global Variables
YJDoc2
YJDoc2

Posted on

Under The Hood : Unassigned Global Variables

Hello!
This is the third post in the series. In case you have not read the previous post in this series, I would recommend you do, as this builds on the parts of previous, and skips the details explained in the previous posts.

I'm learning this all as I write this, If you find any mistakes or have any suggestions and improvements, please let know in comments.

In this post, we will see how the global variables declared in a c file are converted to assembly.

Simple Global Variables

For the first case we will simply declare an in and compile it to assembly :

int a;
Enter fullscreen mode Exit fullscreen mode

The generated complete generated assembly file is :
Global Variables Complete Assembly

Now many of these lines are discussed in previous posts, so if you have not yer read them, I would suggest you read them before this. For the rest of this post, We will only see the lines that are different in this.

Compared to assembly file of completely empty C, this only contains one additional line :

.comm   a,4,4
Enter fullscreen mode Exit fullscreen mode

The general syntax for comm directive is :

.comm name, length
Enter fullscreen mode Exit fullscreen mode

The .comm directive is used to declare common symbols/variables. This means that when multiple files will be linked by the linker, the variables with same name can be merged into a single. If different variables are of different sizes (say int and double) the the largest required size will be allocated for the variable. In case where there is a single definition of a variable, length bytes will be allocated for that variable.

When compiling for ELF target, (as in this case), the comm directive can take an optional third argument :

.comm name, length, alignment
Enter fullscreen mode Exit fullscreen mode

Thus in our program above,
The directive declares a symbol/variable of name a, whose length is 4 bytes, and which should be aligned on a 4-byte boundary.

If we change the c to :

int a;
float b;
Enter fullscreen mode Exit fullscreen mode

The assembly changes to :

.comm   a,4,4
.comm   b,4,4
Enter fullscreen mode Exit fullscreen mode

As on 64 bit system (as the one I am using) the size of int and float is same, 4 bytes.

Now if we change our c to :

double a;
Enter fullscreen mode Exit fullscreen mode

The assembly is changed to :

.comm   a,8,8
Enter fullscreen mode Exit fullscreen mode

As the size of double is 8-byte.

Static Global Variables

static keyword is used to define variables/functions which are only visible inside the files in which they are defined.

if we write our c as :

static int a;
Enter fullscreen mode Exit fullscreen mode

The assembly generated is :

.local  a
.comm   a,4,4
Enter fullscreen mode Exit fullscreen mode

The .local directive takes list of symbol names, which will not be externally visible. As this does not support alignment, the .comm directive is used in combination to declare a local variable with desired alignment.

If we change the c to :

static int a;
static char b;
Enter fullscreen mode Exit fullscreen mode

The corresponding assembly changes to :

.local  a
.comm   a,4,4
.local  b
.comm   b,1,1
Enter fullscreen mode Exit fullscreen mode

Which declares an int as mentioned above, as well as a character, which is 1 byte long.

Constant Global variables

We use constants to declares variables which should not be changed after initial declaration.
If we declare a constant in c as :

const int a;
Enter fullscreen mode Exit fullscreen mode

We observe that the generated assembly is not any different from simply declared global int :

.comm   a,4,4
Enter fullscreen mode Exit fullscreen mode

But if we introduce a main function and try to assign a value to it, we get an error :

const int a;
void main(){a = 5;}
Enter fullscreen mode Exit fullscreen mode
global_var.c: In function β€˜main’:
global_var.c:3:17: error: assignment of read-only variable β€˜a’
    3 | void main() { a = 5; }
      |                 ^
Enter fullscreen mode Exit fullscreen mode

Which indicates that the const property of uninitialized variables is enforced by c compiler on c-to-assembly stage, and the assembler does not know anything about it.

Global Array Variables

Now let's see global array variables :

int a[5];
Enter fullscreen mode Exit fullscreen mode

This generates :

.comm   a,20,16
Enter fullscreen mode Exit fullscreen mode

Which instructs the assembler to keep 5*4=20 bytes of memory aside for the symbol/variable 'a'. The alignment is rounded to nearest power of 2 :
int a[2]; generates .comm a,8,8 , int a[7]; generates .comm a,28,16 and int a[8]; generates .comm a,32,32 .
But this alignment is bounded by 32, and the maximum alignment is of 32 bytes only. The larger sizes are still aligned at 32 bytes only. This might be due to larger the alignment, more memory it might waste, trying to align at higher alignment.

The static and const arrays are compiled similar to other static and const variables :

  • static generates a .local directive for the array name.
  • const for unassigned array is enforced at c-to-assembly compile time and does not show up differently in assembly.

Pointer Variables

Pointers are used to store memory address of some other variables, which themselves can be normal variables or pointer variables. As these has to store address, they must be as long as size of address accessible on machine. For a 64 bit system, it is 8 bytes, and thus pointer to any variable is 8 bytes long.

int *a;
char *b;
Enter fullscreen mode Exit fullscreen mode

is converted as

.comm   a,8,8
.comm   b,8,8
Enter fullscreen mode Exit fullscreen mode

Static and const pointer variables are treated similar to normal variables, except size which is 8 for all, and similarly in arrays, the size of each element is set as 8 bytes.

int *a[5];
Enter fullscreen mode Exit fullscreen mode

is converted to

.comm   a,40,32
Enter fullscreen mode Exit fullscreen mode

This is how uninitialized global variables are translated from c to assembly.

Again,I'm learning this all as I write this, If you find any mistakes or have any suggestions and improvements, please let know in comments.

Thank you !

Top comments (0)