Flow control in programming languages and garden path sentences
I spend a lot of my time these days thinking about best practices and how we can either automate things that we do manually or do the things we do now better. A huge part of that is determining what the best style is for code. One of the more interesting features I've been playing with recently is flow control. By flow control I mean language keywords that adjust whether a given block of code will be executed or how many times. Keywords like if
, then
, else
, while
, unless
, etc.
JavaScript and other C-like syntaxes only support one way of writing these statements. Other, higher-level languages allow you to insert these keywords in more-or-less arbitrary places to be more similar to natural languages. So in a language like Javascript the following would throw an error but in something like Ruby or Python it would be perfectly valid.
process_data.then(pass_to_handler) if data.is_raw == true
While I appreciate the more natural feel of the ability to have the if
in the middle of the statement, I'd like to make the case that this is actually a mistake in language design and programmers should not use it even when available. Allowing flow control to happen in the middle of a statement means that the programmer must read and understand what the block of code is before they reach the flow control keyword and find that they've been tricked. If the if
statement had been at the beginning of the line or even the line above the block to be executed, the programmer could have read the condition, known it to be false and moved on to the next chunk of code. One could argue that the programmer should just be aware of the impending if
keyword but I don't know many people who start reading sentences in the middle.
This is similar to the language phenomenon known as "garden path sentences" where a given chunk of a statement can have multiple meanings. When the reader reads something ambiguous, they parse it as the most common meaning. As they continue to read, they reach words whose syntax makes it clear that the earlier assumption was false. This leads to the reader needing to stop, go back in their thought process and start over again, taking a much longer time to understand what was presented to them.
As an example, consider the sentence "The old man the boat.". Yeah, that probably hurt your head a bit, didn't it? I had to read it a few times before I got the meaning. We start reading the sentence thinking it's about a man who is old but it turns out the actual meaning is that there are old people who take care of the boat.
On average, it takes a person about 600ms to realize there is a mistake in their understanding of the sentence. Given the complexity of the information at hand, I'd hypothesize that the time lost on reading code is much longer than half a second.
As a conclusion, I would propose that programmers no longer use these confusing language features and instead continue to strive to write code that is as easy as possible to read and maintain. I love talking about these little bits of nerdiness so if you're interested in them too, hit me up on twitter! On a related and equally pedantic note, always put a question mark on the end of questions.