Computers can only understand machine code
, which is a low-level programming language*
, generally being made up entirely of numbers.
Most of the time when people talk about machine code
, they refer to it as binary code**
(a number system with base 2 - either a 1 or 0). So we can say that machine code is made out of sequences of ones and zeros
. Below there's a snippet of some binary code:
01101000 01100101 01101100 01101100 01101111
00100000 01110111 01101111 01110010 01101100
We need to comunicate with computers, so in order to do so, we need to speak their language. But writing machine code
is very impractical, so we had to come up with languages that resemble our own language. We call these languages high-level programming languages
. Some examples are: Python, Ruby, C# or Java.
Now the problem is that computers do not understand these languages directly, so we need a translator
that can explain our instructions to the computer. This is how compilers
and interpreters***
were born. They are programs that translate
the code we write using high-level languages, into machine code.
Compiling
means that after we finished writing our code, a compiler
(a program) takes our code and looks at it, making sure it has been written acording the rules of the programming language we used (it checks for syntax errors). If no mistakes are found, the code can be effectively translated into machine code and a file containing compiled code is generated.
We hear people saying that the code doesn't compile
. This means that while checking the code, the compiler found errors and the translation into machine code wasn't succesful. Usually, the programmer will see some error messages that will help them debug the code. Only after all errors are fixed, the code will compile.
IMPORTANT
Just because the code compiles, that doesn't mean the program works. This is because a compiler can't catch logical errors. It might sound counterintuitive, but think about it like writing an English text with no grammatical errors or typos, so the words are correct on their own, but when read together they make no sense.
(*) Low-level programming languages are languages that sit close to the computer's instruction set. An instruction set is the set of instructions that the processor understands.
(**) Machine code can be represented using other number formats, like hexadecimal. These formats are easier to read by humans but the computer can't understand them, so in the end they also get converted to binary.
(***) Interpreters are also used to translate source code into machine code but they work differently from compilers.
Top comments (6)
Amazing explanation!
Thank you.
Great, very clear. Thanks.
Thank you, I'm glad it helped.
Thanks for sharing, this was one of the clearest posts about code compiling I've read latelly.
Would you please allow me to translate to portuguese? As a beginner, I think it could be helpful.
Hi Michel, I am glad it helped. Sure, go ahead :).