"Don't repeat yourself" is such an important and widely taught
concept in programming, that it has its own acronym (DRY).
Every piece of knowledge must have a single, unambiguous, authoritative representation within a system
--- Wikipedia
DRY is a very powerful idea, and avoids a lot of issues, like having to fix the same bug in multiple places, because
the same code has been duplicated. Many voices say that it is often overused leading to a wrong abstraction, and I tend
to agree with that statement.
duplication is far cheaper than the wrong abstraction
--- Sandi Metz
People often overdo the DRY principle by building abstractions the first time a problem occurs. Instead, the problem
should not be abstracted before it has occurred multiple times, since it might easily lead to a wrong abstraction that
might not live up to its responsibilities and ultimately causing more problems than it solves. There are already some
principles like WET (Write everything twice) and AHA (Avoid hasty abstractions) that kind of contradict the DRY
principle respectively limit its applicability.
While I welcome the recognition of DRY overuse in many situations, I think this principle tends to be underused when it
comes to code comments, which is the topic of this blog post.
Comments often violate the DRY principle
In their fantastic book The Pragmatic Programmer David Thomas and Andrew Hunt have coined the DRY principle and they
have explicitly listed that comments are a possible violation of this principle. When people are learning to code, they
often get taught that good code needs lots of comments, which is absolutely not true in my opinion. Very often good
code that is self-explanatory does not need any comments at all and if it does, the comment should describe why it
has been implemented this way instead of just repeating what the code already says.
My favourite stack overflow question of all time deals with code
comments
and lists some really good examples of how not to do it (especially if you skip the funny ones, which unfortunately for
this blog post are the majority).
There is one very obvious example of a bad comment:
return 1; # returns 1
This is a very obvious violation of the DRY principle, whenever the return value changes, the comment also has to be
updated. But there are other not as obvious examples:
$i++; // increase by one
This is only acceptable as an explanatory comment in teaching material, but it should never make its way to a
production codebase.
The fall of doc blocks
Especially in languages with weak typing documentation comments are very popular. Since these languages often don't
allow to specify types in code, people have invented ways to move that information to comments, which allows for a
better understanding of the code when reading it. The alternative would be to read the code and try to find out based
on how these variables are used what type needs to be passed. Popular libraries include PHPDoc
and JSDoc.
/**
* Adds two numbers
*
* @param int $a
* @param int $b
*/
function add($a, $b) {
// ...
}
Especially the @param
made a lot of sense because the code itself does not expose that information in a very
accessible way. But
recent PHP versions improved the type system a lot and
also in JavaScript technologies allowing to add type information like TypeScript get
a lot more popular (compared it to Flow in another article
), which makes these doc blocks obsolete in many cases.
function add(int $a, int $b) {
// ...
}
As a bonus, these type systems will also yell at you if the type is not correctly set, something a pure comment cannot
really help with. So adding another comment just with the type annotation would duplicate that information with no real
value unless the parameter is explained in more detail.
Comments tend to be ignored by developers too
The reason comments exist is to allow adding additional information to the source code in natural language. Whatever is
added as a comment will be ignored by the compiler or interpreter. Developers know that, so many of them learned to
ignore them to a certain degree. That's especially true if they have ever worked with a codebase that contained
outdated comments. I am always very skeptical when reading comments and double-check with the actual implementation if
the statement of the comment is true because I have experienced too often that the code didn't behave as the comment
suggested.
Again, there is an answer in the already mentioned Stack Overflow question:
/**
* Always returns true.
*/
public boolean isAvailable() {
return false;
}
That might look like a really stupid example because it is so terribly obvious. But I totally believe that something
like this can easily happen in a real codebase. Since developers tend to ignore code as well, it is not very unlikely
that they don't update the comment when changing the code for some reason.
The worst thing is that the above example is not even that bad, because after a second you'll realize that the comment
is wrong. More detailed errors in a comment are much harder to recognize because more complex code usually justifies
comments, but they are only helpful if they are actually up to date. If developers don't read comments in the first
place, they are at the same time much more likely to not update them if they change something, giving them again less
reason to believe in them. I would say this is a vicious circle.
Comments should add something
As already mentioned more complex code often justifies comments, at least if they describe reasons or thoughts that are
not obvious from just looking at the code. But if it is considered very strict, this is already a violation of the DRY
principle, because the comment needs an update too when the code changes. But it might be worth the tradeoff if the
code is hard to understand.
A rule I am following is that a comment should not just repeat what the code is already saying. Another phrasing would
be to say that comment must always add values, that would be missing if they weren't there. Just recently there was a
discussion in Austria about
some JavaScript code for a covid-19 vaccination forecast
because the code just seemed to make up some numbers. But the more interesting part of that code was the usage of
comments in it:
if(now.hour() < 6) {
estimated = ausgeliefert; // hour is before 6am
} else if(now.hour() > 17) { // hour is after 6pm
// ...
}
The first comment basically just repeats what the line before is doing. If we need to describe what the line
now.hour() < 6
is doing, then we would basically have to comment every single line in our code. The same is partially
true for the next comment. It was probably written to indicate that although the code says now.hour() > 17
does not
include times like 17:01. It might be a little bit better than the first comment, but I still don't think that it is
worth the tradeoff of duplicating the same information in two different places.
Another tradeoff is the doc block of the add
function from above. As long as the int
type hints are not part of the
code itself, it makes sense to add this information, because it is much easier to find out what types have to be passed
this way. If that information is not there, it might be quite hard and even need some debugging to be sure about the
types that the function accepts. I guess this improvement in developer experience justifies the potential risk of the
comment being outdated. But as already said above, the latest PHP versions support the type hints in code, making the
comments obsolete and guaranteeing the type of the variable.
Good naming can often replace comments at all
Finally, I want to show some code, that might get rid of some comments by writing it in a self-explanatory way. This
makes the code more obvious to read and since it is real code and not just comments, it is much less likely that
developers won't read it.
Let's start with the JavaScript example from the previous section. We've already said that the first comment is kind of
unnecessary, so we can safely omit it. The second comment kind of had a point because it was explaining in a hidden way
that the hour has to be after 18:00, and even though 17:01 is after 17:00, it would not be accepted by the if
statement. Another way to make this more clear is to use the >=
operator instead. It removes that ambiguity and reads
nicer.
if(now.hour() < 6) {
estimated = ausgeliefert;
} else if(now.hour() >= 18) {
// ...
}
Now the code itself is more clear and the comments could be removed, just by using a different operator.
The other two examples I am showing are real-world examples I've run into during my work as a software engineer. The
first one is an if
statement, that tries to find out if a given node represents a document that is a new one or if it
has already existed before. The logic to do so was a bit cryptic, so it made sense to use a comment to explain what was
happening here:
// Check if the document is a new document
if (
!$node->hasProperty(
$this->propertyEncoder->encode(
'system_localized',
StructureSubscriber::STRUCTURE_TYPE_FIELD,
$event->getLocale()
)
)
) {
// ...
}
A very easy way to avoid this comment, is to store the result of the if
statement in a separate variable and give it
a meaningful name:
$isNewDocument = !$node->hasProperty(
$this->propertyEncoder->encode(
'system_localized',
StructureSubscriber::STRUCTURE_TYPE_FIELD,
$event->getLocale()
)
);
if ($isNewDocument) {
// ...
}
This avoids the need for the above comment, and developers cannot really skip the variable name, because it needs to be
referenced later. The comment would have been written in gray by the IDE, kind of telling the developer that these
lines don't really matter. By skipping reading that part of the code, it is also more likely that the comment does not
get updated when the code changes.
It would be even better if this check would be part of a class so that it could be called like $document->isNew()
,
but that's beyond the scope of this article.
Another example I've stumbled upon is the following code:
// remove the "sec:role-" prefix
$roleId = \substr($property->getName(), 9);
The above code will remove the prefix sec:role-
of a string to retrieve the ID based on the name of a property. The
code works, but the number 9
is a so-called magic number, so it needs some explanation, so it somehow feels natural to
just add a comment afterwards. Sometimes constants are used to give such magic constants a name that better explains
what it should be doing. But in this very specific example, there is also a different solution.
$roleId = \str_replace('sec:role-', '', $property->getName());
This example does not make use of code that counts the number of characters, but we are replacing the sec:role-
prefix with an empty string. This way it is clear that the sec:role-
prefix is removed, without the need of a comment
violating the DRY principle.
I really like finding ways to write code in a way that better explains itself. Very often these changes are really
subtle, but they change the way code is read fundamentally and avoiding comments altogether. I hope that these examples
helped you to find some motivation to do so too!
Top comments (13)
I have been writing comments for 20 years now and the reason for commenting code is to let my future-self / others understand the reasoning behind it.
A comment shouldn't explain the code itself unless it's very complex (no way around complexity sometimes)
Even a function which returns something very simple sometimes needs a comment, as to why does it even exist?
For example I have a project with many bizzare functions and timeouts which must have a comment to explain they exist solely because of some browser bug.
Many times I also put a link to a stackoverflow/github discussion or to a browser's bug-tracker url.
I also write comments such as instructions to myself to improve some function and how to do it, if it takes too much time to write it perfectly and right now i need to write it dirty just to continue my work.
I LOVE comments and believe a code without comments it like cookbook with just the ingredients and no instructions. yeah ok, this cake needs 3 eggs and a lemon. now please tell me WHY it needs those or I will not comply. I must know the "why" of everything.
The last analogy is not really correct... The code is some kind of instruction. The other suggestions you have mentioned I would consider good comments: E.g. that something is necessary because of a browser bug is not clear by just looking at the code. Also adding a comment if you know how to improve but you don't have time is a good use case (of course it would be better to just fix it right away, but that is very idealistic).
I've been convinced for quite some time that comments that describe what the code is doing (instead of why it does it) are an anti pattern and a code smell. But I had never thought about the DRY side of things, and especially the fact that it leads to comment becoming wrong and misleading when one updates the code but forget about comments (and thinking about it I've seen it numerous times in our codebase). That's a very good argument against purely descriptive comments.
Also :
I don't think it is. Introducing new methods when things get complicated is actually my favorite way of making the code self explanatory and avoid the need for comments
Totally agree, that would be better and avoid a comment, and it could have been included as well. But just extracting the variable makes the same point, and this way I had to to dive less into the class structure 🙂
I appreciate the encouragement to think of ways to express meaning more clearly in the code, something I should practice more!
I wonder if a lot of redundant comments are a hang over from the 'pseudo code' approach to writing software, where you start by describing the required logic in a comment, then work through that in real code, which may be more obscure/complex, but leaving the comments to explain yourself...
Yeah, that's something I have liked sooner as well. Write comments first, implement afterwards. It might be a good idea to think about the big picture first, but leaving the comments if the code can easily talk for itself is really a problem.
Explanatory prose is a good comment. Explain the why and how something complex is put together. Future programmers can then more readily understand why something was done, and if that rationale is still applicable.
Comments that just re-iterate what the code is doing are useless comments. Assuming (next point)...
Code that uses good names such that it needn't be commented at all is better than code that uses unhelpful names, and necessitate commenting what the poor code with non-descriptive names is doing.
Comments that do not jibe with what the code is doing are more-than-useless, because the future programmer will be puzzled if the comment is correct and the code is broken, or the code is correct and the comment is outdated and/or specious.
In PHP I find Psalm incredibly useful for stuff like this. It will flag any discrepancies in your DocBlocks and can often fix them automatically. It also supports using DocBlocks to specify some quite complex type information, such as arrays with a specific structure.
Obviously type hints are the best way to specify this sort of thing since they're enforced by the compiler, but they can't always be as specific as DocBlocks can - you can enforce a return type being an array with a type hint, but you can't enforce the structure of said array.
Agree, and if the comment includes information like that, it is not a pure duplication of what the code already says. Different story though if PHP supports generics at some point.
I highly agree. Clean code is a code that is self-explanatory.
IMO comments should only explain business logic - never the algorithmic or syntactic logic.
Great article about comments.
I personally try to write better names at variables and functions with this way the comments should became tiny because the name's would say it all
Your link "compared it to Flow in another article" is broken
Thank you, I am just crossposting here, and the link should go to homepage... That's fixed now 🙂