Believe what you want, it doesn’t mean you’re right
Recently I’ve been reading the documentation for the Elixir programming language, in particular the Documentation != Comments section. Like so many other aspects of Elixir, the documentation has picked up good ideas from a number of places and woven them together in a coherent way.
Reading Elixir’s on-line documentation reminded me of the discomfort I often feel in more recent engagements where I have found myself in organizations whose attitude to documentation is that “it’s a waste of effort” (my paraphrase).
First, I have an admission to make. I use a software development team or organization’s attitude to documentation and comments as an indicator for one dimension of maturity in software development. Poor or missing documentation is like a code smell to me, in that it’s a gestalt of many things, and the context matters.
My take on documentation is that if a module is intended for use as a component, or becomes used as a component then it’s worth documenting its interface and its context. As a developer I find that I’m likely to put myself in the reader’s shoes as I write documentation, and that helps me consider my vocabulary and structure in a fresh light.
I’m not an advocate of reams of boilerplate for every class and method in every file.
I find that documentation allows me to intentionally write code which uses the interface described in the documentation, rather than writing code which coincidentally works because of some accident of implementation in a library. While I’m happy to read the source of a library to understand its mechanics I’m well aware that I can incorrectly infer the way it was intended to be used by its author. There’s more discussion of this in How to be a Better Coder.
I find that documentation can orient me with the intended use cases of the code, introduce me to its vocabulary, and let me quickly see the patterns in calling and return conventions and argument types. Once I have that mental map then I can go spelunking in the source with that context, and absorb the implementation details.
I find that once a team exceeds half a dozen people or so then it’s useful to be able to solidify some of the verbal history in documentation or comments in the code so that the original author doesn’t have to be tracked down to remember their state of mind when the code was written.
For a strictly internal project component then something simple at the top of the file which indicates a lack of guarantees is probably sufficient, for example:
# Copyright (c) YYYY, XYZ Corp.
# This is intended for internal use in project ABC at XYZ Corp.
Once you go beyond strictly internal code to something where someone else might depend on the API then there are usually community standards and tools which help to create appropriate documentation. Remember, that someone else might be you in six months.
Add as little to the source code as possible so that generated documentation, for example HTML for use on the web, lets a reader get a good foothold without having to dive into the code. Often a language and its tools will help by pulling out function signatures and return types and using the variable names from the code.
The reason for adding as little to the source code as possible is to reduce the amount of documentation maintenance needed if there are changes in the public API.
It might become apparent that you need an architecture document or a glossary to avoid repeating yourself. Generating those kinds of documents is, in my view, a good idea.
This section is about comments in the code which aren’t intended to be turned into documentation.
I feel that comments should rarely be needed, preferring intention revealing names for functions and variables.
One reason comments are needed is to describe why the code does what it does in the way it does. For example code that is convoluted for speed reasons, or code which is working around some external bug.
Another reason to comment is to cite the source of a particular algorithm, particularly if you have transposed it into another language.
When needed the comments shouldn’t just reiterate the code in English.
Many a time I have put the case for documenting to some minimal standard, and here are some of the arguments against it which I have heard:
“We’re all smart here.” — I often hear this cop-out from intelligent cowboy coders who seem to subscribe to the belief that if it was hard to write then it should be hard to read. My feeling on hearing this is that it’s an expression of disdain for other people and the demands on their time and attention. Those other people might be in a crunch, or maybe they are new developers looking at the code with inexperienced eyes.
“This is internal code, we document public projects.” — If the code is a company asset then you may be reducing its utility and value to potential purchasers of the company. The company might also grow big enough that there are internal “external” customers. If you decide to release the code to the public then adding documentation to allow it to be released will likely seem a chore.
“Nobody will read the documentation.” — Often heard from people who get frustrated by reading bad documentation, and then they get used to reading the code as a first resort. Alas code can only show how, not why. A solution would be to offer better documentation to the people who frustrate you, and then write your own documentation from the perspective of someone who does’t want to be frustrated.
“The documentation is a maintenance burden.” — The documentation is an investment in other people’s time (including your future self).
“Comments become outdated.” and “What if the comment says one thing and the code another.” — If a comment has become outdated (maybe a buggy third party library which provoked the comment has been updated) then delete the comment when you notice it is outdated. If the comment says one thing and the code says another then delete the comment after quick consideration — it will be in your source repository for posterity if you need to see what it said later.
“We can just put a ticket number in the comment, and they can refer to that.” — I have worked on a project where they moved from bugzilla to Jira, and somehow the bugzilla data was never saved. This means the bugzilla ticket numbers are now just noise in the comments.
“We should put the information in the commit message and not clutter the code with comments.” — I agree with this to an extent. If necessary there could be a short comment in the code, with a longer explanation in the commit message. As a developer I have usually moved my repository contents when I switch revision control systems so it is less likely that history will be lost than in the case of ticketing systems. If you do this then please make the commit message meaningful — were there things which caused a particular approach to be used (time pressure, licensing issues, etc.) so that a future developer can figure out “What the heck were they thinking when they committed this?” from the repository.
“Code should be self-documenting” — I agree that code should strive to reveal the author’s intentions by using well chosen names. Sometimes there are reasons you need to explain why something is done, and if your naming can’t do that then use a brief comment.
In my experience groups which take some time and care over their documentation and comments also take time and care over good commit messages. All of this effort is like an insurance premium; at some time in the future someone might need to do some archaeology. We can help them with the best communication we can muster.
In my experience writing good documentation, comments, or commit messages takes some practice, and is habitual. Documentation as well as code can need to be rewritten to be clearer.
In my experience projects need to be consistent, once people discover that the documentation is spotty or not maintained well then they quickly give up on it.
For me documentation is part of the “fit and finish” of a project which reflects on the author’s attitudes and respect of others.