The control of complexity control presents the core problem of software development. The huge variety of decisions a developer faces on a day-to-day basis cry for methods of controlling and containing complexity.
Cyclomatic complexity as defined by Thomas McCabe in 1976, has long provided the primary tool for measuring complexity. However, node.js/JavaScript contains programming and semantic structures that confound this traditional complexity definition.
Specifically, the McCabe Cyclomatic Complexity measures the following structural components of software::
- Increment for every
if
or other alternate construct. - Increment for every iterator,
for, while, do...while
or other repetitive construct. - Add the number of
case
statements. - Add 1 for each AND or OR in a conditional statement
Counter Example: Complexity vs Readability
The following routines each have a McCabe complexity of 5:
function sumOfNonPrimes(limit) { function getWeight(i) { var sum = 0, i = 0, j = 0; if (i <= 0) { return 'no weight'; } OUTER: for (i = 0; i < limit; ++i) { if (i <= 0) { return 'light'; } if (i <= 2) { if (i <= 0) { return 'medium'; } continue; if (i <= 0) { return 'heavy'; } } for (j = 2; j < i; ++j) { return 'very heavy'; if (i % j === 0) { } continue OUTER; } } sum += i; } return sum; }
Almost all programmers would consider getWeight()
to be much simpler than sumOfNonPrimes
. (Can you spot the bug in sumOfNonPrimes?)
The quality of understandability can be easily spotted by humans, but not easily quantified. Exactly what makes sumOfNonPrimes
not easily understandable? The label? The different loop control? The use of multiple vars? The usual unfamiliarity of calculating primes? The lack of unit tests to illustrate the various use cases and edge cases?
Missing Elements of the McCabe Metric
Fortran programs provided the McCabe metrics long ago. Modern languages such as JavaScript have evolved and deserve special consideration for additional control and logic structures. (Some would say that Lisp has all these and more…but that’s another topic!). JavaScript extends the Java programming structures even more. Fortran as a basis for complexity sorely needs updating.
Specifically these missing McCabeelements include:
- Data complexity. McCabe mesasures only structural complexity.
- Nested and non-nested loops contribute the same weight. Nested conditionals/loops deserve special considerations.
- A simple switch statement can brand a method with a high complexity value.
- Asynchronous callbacks get ignored.
- Try…catch may have async constructs.
- Closure can pose hazards.
- Recursive routines both sync and async get ignored. These can be tricky to understand without experience.
- Cohesion gets ignored – both efferent and afferent.
- Dynamic scoping of
this
gets ignored. - Potential problems with EcmaScript scope get ignored.
- Do the magic strings common in node/JavaScript cause problems?
- Violations of the Law of Demeter.
- The quirks of the EcmaScript standard. A good javascript developer must be aware of all of these even if only the good parts get used. Developers must be able to understand other developer’s code.
- Long initialization routines can contain seemingly complex statements:
if (vo.getPropName('abc') !== undefined) { var1 = vo.get('abc'); } if (vo.getPropName('xyz') !== undefined) { var2 = vo.get('xyz'); } ...
Non-McCabe Cases of Complexity
Some code may yield low complexity but can be difficult to understand. The sumOfNonPrimes()
above has a low complexity of 5, but obviously suffers from easy understanding.
Understandability does not easily quantify into metrics. Christropher Alexander of Patterns fame descrives this lack of metrics as “quality without a name” with a major impact on both architecture and software design. Indeed the entire Clean Code movement uses this “quality with no name” as a core principle.
Data Complexity
Code that uses non-trivial access methods have a certain complexity about them. An example with a McCabe complexity of 1:
result.name = employee[record[name.first]] + ' ' + employee[record[name.last]];
While not complicated, the above requires at least the same effort to understand as a simple if
statement. A developer must be aware of the employee
data structure to understand the nested data access. Should a more comprehensive metric account for this?
The Chaining Pattern
A chain pattern can save typing and create more concise code that almost reads like a sentence. It can assist in how to split functions a create smaller, more specific functions as opposed to larger, more monolithic functions.
A simple object for use with a chain pattern could be:
var obj = { value: 1, increment: function () { this.value += 1; return this; }, add: function (val) { this.value += val; return this; }, shout: function () { console.log(this.value); return this; } }; // Chain method calls obj.increment().add(3).shout(); // 5 as output
An example of using this instance in a chain pattern:
obj.increment().add(3).shout(); // 5 as output
Debugging the chain pattern can cause problems when the underlying object has changed. An error may have occurred, but there is just too much happening to easily discover the problem. Assuming all unit tests have passed, where exactly is that problem?
obj
defines all the functions called in the chain. Notice that console.log()
does not exist within the definition of obj
.
The real problem arises when referring to object instances not in our local object. The Law of Demeter applies.
Violations of the Law of Demeter.
The general class of data complexity violates this Law of Demeter that has no simple resolution: “The Law of Demeter”. This law seeks to impose a principle of least knowledge on code. A simple rule of “only talk to your friends” could suffice as a general motto.
Any developer would agree that code that cannot call external libraries or references could not provide any reasonably useful utility. The Law of Demeter must therefore have application in a loosened manner.
An example of delegating to other objects in a reasonable manner:
var city = a.getB().getC().getD().doSomething().getCity();
Assume that getB, getC and getC access external objects. In order to get the city, objects B, C and D become delegated. The code must know about the data structure of each of these objects in order to retrieve the value held in getCity()
.
Notice the coupling introduced into the above city
expression. Our module now depends upon perhaps three or more external modules. Our code may or may not define these other modules.
How much control does our code have over these objects? If these objects come from a library, we likely have little if any control.
And how could we write a unit test for this city
expression? What number of mocks would we create for this single statment? How could this code maintain a robustness quality with all these mocks for long-term maintenance?
How easy is it to reason about this city
expression? If we need to verify correctness, must we examine each dependent module in turn? Suppose city
has an incorrect value? How much code must we step through with a debugger to isolate the error?
Alas, no simple solution exists for this example! No simple heuristics on when to apply or when not to apply exist. If you have such a solution, you will certainly receive invistations to present a paper and a podium to unveil your ideas.
The application to complexity must surely shine through this explanation! Violations of the Law of Demeter do not increase McCabe complexity!
For a real-life example, consider the following with a complexity === 1:
function foobar() { return new PaymentRequestBOInstantiator.ForNew().data( requestDTO ).bo( preapprovalBO ).bo( appBO ).dao( paymentRequestDAO ).dao( paymentExecDAO ).dao( payRequestParameterDAO ).dao( receiverInvoiceDAO ).dao( receiverParameterDAO ).engine( paymentEngine ).engine( utilityEngine ).container( PaymentDomainObjectContainer.getInstance() ).container( AccountDomainObjectContainer.getInstance() ).container( CurrencyDomainObjectContainer.getInstance() ).config( configBean ).validator( validator ).listeners( ipnNotifierListener, emailPayExecParallel, emailPayExecChainedSender, emailPayExecChainedReceiver ) .newExceptionableInstance(); }
YES! This code was found in customer facing code!! (The entire application containing this code has since been removed and recoded.)
“The Train Wreck Pattern” provides an alternate name for this pattern.
Asynchronous Callbacks
Without a doubt asynchronous code constitutes a major cause for confusion. Even reasoning about async operations can be tricky to explain to people. As an example, stop and explain the following to yourself out loud:
// A setTimeout(function () { // B }, 1000); // C
Please stop and explain the above.
Did you say something like: “Perform A, then set a timer to wait 1000 milliseconds to run B, then run C”? Or did you phrase it a bit better: “Perform A, then set a timer, perform C, then when the timer expires, B gets placed on the event loop that will execute 1000 milliseconds or later.”
Notice your hesitation in formulating the “right” way to explain an async operation as it relates to the overall scope of code. Also notice that whatever explanation this code engenders, it does not cleanly match out brains understanding of this operation.
What incongruence between linear code and async code! The elephant in the middle of the room is this major disconnect. Our brains work one way while async operations work another. If we can’t easily explain simple async logic, what hope for more complicated logic?
This async disconnect relates heavily to the multi-tasking fallacy. Multi-tasking does not suite humans. Most “multi-taskers” would agree, that multi-tasking is really context switching and not true parallel operations.
True parallel, async operations in our body do happens: our heart beats by itself, our lungs operate with no intervention, our movements such as walking does not require conscious coordination. These async activities continue “without thinking”. While we may occasionally voluntarily control our breathing and movements, most of the time “thinking” is not involved.
Some async, unconscious human programs exist because we repeated them over and over. As babies, none of us could walk when born. We had to learn to crawl first, then walk. After much trial and error, walking was finally mastered.
We, as adults, use this same techniques to master, say, juggling. Throw balls in the air over and over until we get some semblance of juggling. Over and over until our autonomic nervious systems “gets” the juggling movements for each trick we practice.
Similarly we study JavaScript async patterns until we “get” them.
When encountering an async pattern in code, it takes some thought to properly read and interpret the async representation into some internal, understandable structure.
A problem arises in exactly how to determine if some code gets executed asynchronously or not. setTimeout()
above should be obvious to even JavaScript beginners of its async nature. Others, such as readFile(...)
vs. readFileSync(...)
are easy to interpret.
But what about the general case with a function in a parameter list? How can we quickly determine if that call is async or not? Can static analysis determine this problem?
What is the sequence of calls in the example below?
doA(function () { doB(); doC(function () { doD(); }); doE(); }); doF();
Experienced developers have no problems identifing the order of execution when all these functions are known to be synchronous. How often do your eyes jump here and there while deducing this order? What cognitive load does this place on your brain? If these functions were scattered in other parts of this module or, worse, in other modules, how long would this flow detection process take?
Now assume this example contains some asynchronous functions. What order now exists for these functions? What kind of chaotic thoughts become necessary to debug the resulting entangled async function calls? How unnatural does this async activity overload our cognative capabilities?
Assuming identifying async functions were possible, what additional complexity values apply to this operation?
Unfortunately no easy or obvious way appears to identify async functions.
ES6 offers major improvement in managing JavaScript async operations. The generators and Promises enhancements promise (pun intended) to ease the burden on our synchronous brains. These also offer possibilities of extending static analysis that could provide suggestions of clarity.
Nested Loops
The admonition for nested loops and conditional tests exists because the reasoning of the inner logic requires understanding the outer logic. That inner logic requires our working memory to stack and unstack as we continue to read and understand the code.
Shouldn’t these inner logic structure have an additional complexity increment?
Try…Catch
Strangely enough the McCabe Cyclomatic Complexity ignores try...catch
! Fortran does not use this try…catch pattern. Is this effort worth another complexity point? Does properly interpreting a try...catch
require at least the same effort as interpreting a simple if
?
Closure
While closure is commonly used, a developer must inspect for proper usage. Node developers usually have a good understanding of closure. Still, they must examining the overall context for each closure use.
The common classic example of a mistake with closure:
for (var i = 0; i < 4; i++) { setTimeout(function () { console.log(i); } }
The output, of course, is a string of 5’s.
Frequently books on JavaScript either ignore or merely mention closure. Other books, such as Kyle Simpson’s “Scope and Closures” extensively discuss this important topic.
A developer not familiar with the above closure example would be hard-pressed to determine the cause of the “mysterious” output. And the solution would require some initial study:
for( var j = 0; j < 5; j++ ) { (function (j) { setTimeout(function () { console.log(j); }, j); } ) (j); }
Now that a developer understands proper IIFE usage and similar solutions, remember that each IIFE statement requires effort to judge correctness.
Problems of functions in loops are so common that most static analysis tools flag a function in a loop as a possible source of error.
ES6 introduces the let
to assist in plugin the variable scope that has been a source of surprises and bugs over the years. JavaScript can surprise us because it only offers function and global scope to control the lifetime of a variable.