You might have heard the word "Garbage Collection" before while learning a programming language, So Let's discuss the meaning of this word before digging into the algorithm.
(NOTE: All code examples using JavaScript)
What is the problem we have?
Let's have a look at the following example.
const age = 23;
let person = {
fullName: 'Adham Niazy',
age: 23
};
You could think any programming language should handle the first two lines the same way because we are just creating two variables.
But actually, they are a little bit different from each other. Let's see why.
The part on the left is for visualizing how memory saves each variable with its value, so when we created a variable called age
, the program found that its value is a simple value of 23
.
NOTE: (Booleans, Strings, Numbers, Undefined) are simple data types
But when we created a variable called person
, the program found that its value isn't simple but complex!
NOTE: (Objects, Arrays) are complex data types
So, in this case, let's imagine that we have another type of memory on the right part called Heap
, this part will hold all complex data types and will give the main memory only an address (Reference) of the stored data. So when we try to access person
variable, the main memory will find an address, following this address to reach the actual spot in the Heap
.
The real problem is when this happen. โ
// Problem case
person = 'Anything';
If you understood the last part well, you might have guessed the problem that happened here.
Reassigning the person
variable with another value will remove the address (Reference) that points to the Heap
! but the critical point is that the person object still exist in the Heap
and haven't been removed. (This will cause a memory leak!!).
What is the solution?
The solution depends on the implementation of the language itself.
Example C++
doesn't handle this case and leave it for the programmer to take care of removing the object from the Heap
manually before reassign the variable.
Languages Like JavaScript, Java, Python
uses automatic memory management. They allocate memory when needed and deallocate it when no longer used. The deallocation process is known as Garbage Collection
Garbage Collection
has many algorithms to implement but the most popular algorithm is Mark-Sweep
.
Mark-Sweep Algorithm โ๏ธ๐งน
Mark โ๏ธโ๏ธโ๏ธ:
Starting from the root
reference, we traverse the entire reference graph in our program. Think of the root reference as a global object from which all other "Used" objects in our program are reachable via an existing reference.
While traversing the root
reference we will mark
all objects we use in our code.
// Mark
const objects = [root];
while(objects.length) {
const currentObject = objects.pop();
currentObject.mark = true;
pointers(currentObject).forEach(ref => {
objects.push(ref);
});
}
Sweep ๐งน๐งน๐งน:
We traverse the entire Heap
. If a particular memory block is marked, we reset the mark flag for the next iteration. And If the block isn't marked this means that we no longer uses this data so we can deallocate this memory block. This phase is known as "sweep"
// Sweep
let current = heap;
while(current) {
if ( current.mark ) {
current.mark = false;
} else {
free(current);
}
current = current.next;
}
โ ๏ธ The implementation above is incomplete: for example, it doesn't handle circular references. Its only purpose is to demonstrate the garbage collection algorithm in familiar syntax. and let you know how languages implemented this feature
And that's it! Hope you enjoyed this article ๐ค
Top comments (0)