Complexity for JavaScript

The control of complexity control presents the core problem of software development. The huge variety of decisions a developer faces on a day-to-day basis cry for methods of controlling and containing complexity.

Cyclomatic complexity as defined by Thomas McCabe in 1976, has long provided the primary tool for measuring complexity. However, node.js/JavaScript contains programming and semantic structures that confound this traditional complexity definition.

Specifically, the McCabe Cyclomatic Complexity measures the following structural components of software::

  • Increment for every if or other alternate construct.
  • Increment for every iterator, for, while, do...while or other repetitive construct.
  • Add the number of case statements.
  • Add 1 for each AND or OR in a conditional statement

Counter Example: Complexity vs Readability

The following routines each have a McCabe complexity of 5:

function sumOfNonPrimes(limit) {            function getWeight(i) {
    var sum = 0, i = 0, j = 0;                  if (i <= 0) { return 'no weight'; }
    OUTER: for (i = 0; i < limit; ++i) {        if (i <= 0) { return 'light'; }
        if (i <= 2) {                           if (i <= 0) { return 'medium'; }
            continue;                           if (i <= 0) { return 'heavy'; }
        }                                       
        for (j = 2; j < i; ++j) {               return 'very heavy';
            if (i % j === 0) {              }
                continue OUTER;
            }
        }
        sum += i;
    }
    return sum;
}

Almost all programmers would consider getWeight() to be much simpler than sumOfNonPrimes. (Can you spot the bug in sumOfNonPrimes?)

The quality of understandability can be easily spotted by humans, but not easily quantified. Exactly what makes sumOfNonPrimes not easily understandable? The label? The different loop control? The use of multiple vars? The usual unfamiliarity of calculating primes? The lack of unit tests to illustrate the various use cases and edge cases?

Missing Elements of the McCabe Metric

Fortran programs provided the McCabe metrics long ago. Modern languages such as JavaScript have evolved and deserve special consideration for additional control and logic structures. (Some would say that Lisp has all these and more…but that’s another topic!). JavaScript extends the Java programming structures even more. Fortran as a basis for complexity sorely needs updating.

Specifically these missing McCabeelements include:

  • Data complexity. McCabe mesasures only structural complexity.
  • Nested and non-nested loops contribute the same weight. Nested conditionals/loops deserve special considerations.
  • A simple switch statement can brand a method with a high complexity value.
  • Asynchronous callbacks get ignored.
  • Try…catch may have async constructs.
  • Closure can pose hazards.
  • Recursive routines both sync and async get ignored. These can be tricky to understand without experience.
  • Cohesion gets ignored – both efferent and afferent.
  • Dynamic scoping of this gets ignored.
  • Potential problems with EcmaScript scope get ignored.
  • Do the magic strings common in node/JavaScript cause problems?
  • Violations of the Law of Demeter.
  • The quirks of the EcmaScript standard. A good javascript developer must be aware of all of these even if only the good parts get used. Developers must be able to understand other developer’s code.
  • Long initialization routines can contain seemingly complex statements:
    if (vo.getPropName('abc') !== undefined) { var1 = vo.get('abc'); }
    if (vo.getPropName('xyz') !== undefined) { var2 = vo.get('xyz'); }
     ...
    

Non-McCabe Cases of Complexity

Some code may yield low complexity but can be difficult to understand. The sumOfNonPrimes() above has a low complexity of 5, but obviously suffers from easy understanding.

Understandability does not easily quantify into metrics. Christropher Alexander of Patterns fame descrives this lack of metrics as “quality without a name” with a major impact on both architecture and software design. Indeed the entire Clean Code movement uses this “quality with no name” as a core principle.

Data Complexity

Code that uses non-trivial access methods have a certain complexity about them. An example with a McCabe complexity of 1:

result.name = employee[record[name.first]] + ' ' + employee[record[name.last]];

While not complicated, the above requires at least the same effort to understand as a simple if statement. A developer must be aware of the employee data structure to understand the nested data access. Should a more comprehensive metric account for this?

The Chaining Pattern

A chain pattern can save typing and create more concise code that almost reads like a sentence. It can assist in how to split functions a create smaller, more specific functions as opposed to larger, more monolithic functions.

A simple object for use with a chain pattern could be:

        var obj = {
            value: 1,
            increment: function () {
                this.value += 1;
                return this;
            },
            add: function (val) {
                this.value += val;
                return this;
            },
            shout: function () {
                console.log(this.value);
                return this;
            }
        };
        // Chain method calls
        obj.increment().add(3).shout(); // 5 as output

An example of using this instance in a chain pattern:

        obj.increment().add(3).shout(); // 5 as output

Debugging the chain pattern can cause problems when the underlying object has changed. An error may have occurred, but there is just too much happening to easily discover the problem. Assuming all unit tests have passed, where exactly is that problem?

obj defines all the functions called in the chain. Notice that console.log() does not exist within the definition of obj.

The real problem arises when referring to object instances not in our local object. The Law of Demeter applies.

Violations of the Law of Demeter.

The general class of data complexity violates this Law of Demeter that has no simple resolution: “The Law of Demeter”. This law seeks to impose a principle of least knowledge on code. A simple rule of “only talk to your friends” could suffice as a general motto.

Any developer would agree that code that cannot call external libraries or references could not provide any reasonably useful utility. The Law of Demeter must therefore have application in a loosened manner.

An example of delegating to other objects in a reasonable manner:

var city = a.getB().getC().getD().doSomething().getCity();

Assume that getB, getC and getC access external objects. In order to get the city, objects B, C and D become delegated. The code must know about the data structure of each of these objects in order to retrieve the value held in getCity().

Notice the coupling introduced into the above city expression. Our module now depends upon perhaps three or more external modules. Our code may or may not define these other modules.

How much control does our code have over these objects? If these objects come from a library, we likely have little if any control.

And how could we write a unit test for this city expression? What number of mocks would we create for this single statment? How could this code maintain a robustness quality with all these mocks for long-term maintenance?

How easy is it to reason about this city expression? If we need to verify correctness, must we examine each dependent module in turn? Suppose city has an incorrect value? How much code must we step through with a debugger to isolate the error?

Alas, no simple solution exists for this example! No simple heuristics on when to apply or when not to apply exist. If you have such a solution, you will certainly receive invistations to present a paper and a podium to unveil your ideas.

The application to complexity must surely shine through this explanation! Violations of the Law of Demeter do not increase McCabe complexity!

For a real-life example, consider the following with a complexity === 1:

        function foobar() {
        return new PaymentRequestBOInstantiator.ForNew().data( requestDTO ).bo(
            preapprovalBO ).bo( appBO ).dao( paymentRequestDAO ).dao(
            paymentExecDAO ).dao( payRequestParameterDAO ).dao(
            receiverInvoiceDAO ).dao( receiverParameterDAO ).engine(
            paymentEngine ).engine( utilityEngine ).container(
            PaymentDomainObjectContainer.getInstance() ).container(
            AccountDomainObjectContainer.getInstance() ).container(
            CurrencyDomainObjectContainer.getInstance() ).config(
            configBean ).validator( validator ).listeners(
            ipnNotifierListener, emailPayExecParallel,
            emailPayExecChainedSender, emailPayExecChainedReceiver )
            .newExceptionableInstance();
        }

YES! This code was found in customer facing code!! (The entire application containing this code has since been removed and recoded.)

“The Train Wreck Pattern” provides an alternate name for this pattern.

Asynchronous Callbacks

Without a doubt asynchronous code constitutes a major cause for confusion. Even reasoning about async operations can be tricky to explain to people. As an example, stop and explain the following to yourself out loud:

        // A
        setTimeout(function () {
            // B
        }, 1000);
        // C

Please stop and explain the above.

Did you say something like: “Perform A, then set a timer to wait 1000 milliseconds to run B, then run C”? Or did you phrase it a bit better: “Perform A, then set a timer, perform C, then when the timer expires, B gets placed on the event loop that will execute 1000 milliseconds or later.”

Notice your hesitation in formulating the “right” way to explain an async operation as it relates to the overall scope of code. Also notice that whatever explanation this code engenders, it does not cleanly match out brains understanding of this operation.

What incongruence between linear code and async code! The elephant in the middle of the room is this major disconnect. Our brains work one way while async operations work another. If we can’t easily explain simple async logic, what hope for more complicated logic?

This async disconnect relates heavily to the multi-tasking fallacy. Multi-tasking does not suite humans. Most “multi-taskers” would agree, that multi-tasking is really context switching and not true parallel operations.

True parallel, async operations in our body do happens: our heart beats by itself, our lungs operate with no intervention, our movements such as walking does not require conscious coordination. These async activities continue “without thinking”. While we may occasionally voluntarily control our breathing and movements, most of the time “thinking” is not involved.

Some async, unconscious human programs exist because we repeated them over and over. As babies, none of us could walk when born. We had to learn to crawl first, then walk. After much trial and error, walking was finally mastered.

We, as adults, use this same techniques to master, say, juggling. Throw balls in the air over and over until we get some semblance of juggling. Over and over until our autonomic nervious systems “gets” the juggling movements for each trick we practice.

Similarly we study JavaScript async patterns until we “get” them.

When encountering an async pattern in code, it takes some thought to properly read and interpret the async representation into some internal, understandable structure.

A problem arises in exactly how to determine if some code gets executed asynchronously or not. setTimeout() above should be obvious to even JavaScript beginners of its async nature. Others, such as readFile(...) vs. readFileSync(...) are easy to interpret.

But what about the general case with a function in a parameter list? How can we quickly determine if that call is async or not? Can static analysis determine this problem?

What is the sequence of calls in the example below?

    doA(function () {
        doB();
        doC(function () {
            doD();
        });
        doE();
    });
    doF();

Experienced developers have no problems identifing the order of execution when all these functions are known to be synchronous. How often do your eyes jump here and there while deducing this order? What cognitive load does this place on your brain? If these functions were scattered in other parts of this module or, worse, in other modules, how long would this flow detection process take?

Now assume this example contains some asynchronous functions. What order now exists for these functions? What kind of chaotic thoughts become necessary to debug the resulting entangled async function calls? How unnatural does this async activity overload our cognative capabilities?

Assuming identifying async functions were possible, what additional complexity values apply to this operation?

Unfortunately no easy or obvious way appears to identify async functions.

ES6 offers major improvement in managing JavaScript async operations. The generators and Promises enhancements promise (pun intended) to ease the burden on our synchronous brains. These also offer possibilities of extending static analysis that could provide suggestions of clarity.

Nested Loops

The admonition for nested loops and conditional tests exists because the reasoning of the inner logic requires understanding the outer logic. That inner logic requires our working memory to stack and unstack as we continue to read and understand the code.

Shouldn’t these inner logic structure have an additional complexity increment?

Try…Catch

Strangely enough the McCabe Cyclomatic Complexity ignores try...catch! Fortran does not use this try…catch pattern. Is this effort worth another complexity point? Does properly interpreting a try...catch require at least the same effort as interpreting a simple if?

Closure

While closure is commonly used, a developer must inspect for proper usage. Node developers usually have a good understanding of closure. Still, they must examining the overall context for each closure use.

The common classic example of a mistake with closure:

    for (var i = 0; i < 4; i++) {
        setTimeout(function () {
            console.log(i);
        }
    }

The output, of course, is a string of 5’s.

Frequently books on JavaScript either ignore or merely mention closure. Other books, such as Kyle Simpson’s “Scope and Closures” extensively discuss this important topic.

A developer not familiar with the above closure example would be hard-pressed to determine the cause of the “mysterious” output. And the solution would require some initial study:

    for( var j = 0; j < 5; j++ ) {
        (function (j)  {
            setTimeout(function () {
                console.log(j);
            }, j);
        } ) (j);
    }

Now that a developer understands proper IIFE usage and similar solutions, remember that each IIFE statement requires effort to judge correctness.

Problems of functions in loops are so common that most static analysis tools flag a function in a loop as a possible source of error.

ES6 introduces the let to assist in plugin the variable scope that has been a source of surprises and bugs over the years. JavaScript can surprise us because it only offers function and global scope to control the lifetime of a variable.

About Cecil McGregor

As a software developer with many years experience, I am offering some of the many insights learned through the school of hard knocks. My passion is writing software! And I like to write superior software. I try to follow current trends and techniques as I apply them to my everyday work.
This entry was posted in javascript, software and tagged , , , , . Bookmark the permalink.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.