Falafel, Source Rewriting, and a Magicial Assert

A couple of weeks ago, I wrote about how to do some basic static analysis of Javascript source code using Esprima. This post is a follow up where I demonstrate how to write a program to transforms Javascript source code using an Esprima-based library - falafel.

Hello, Falafel

Falafel is an AST traversal library like estraverse. If you've read my previous article, this is familiar territory. Instead of taking the AST as an argument, falafel takes the source code as a string.

falafel('console.log("hello world");', function(node){
  console.log('Entered node', node.type);
});

In addition to traversal, Falafel also adds some interesting methods to the nodes being traversed:

update(newCode:string) - replaces the source of the node with new code - a little like the DOM replaceChild method.
source():string - returns the original source of the node.

Armed with these new weapons, let's write something. Let's say you want to see a timestamp with all your console.log statements. You could use a logging library, but you'd have to rewrite all your console.log statements. Or, you could write a program to automatically rewrite your source code!

So, for this problem, given the input

console.log('hello');

we want the output - the rewritten program - to be

console.log(new Date() + ':', 'hello');

First, we need to detect all console.log statements in the program. We'll do that using this function

function isConsoleLog(node){
  return node.type === 'CallExpression' &&
    node.callee.type === 'MemberExpression' &&
    node.callee.object.type === 'Identifier' &&
    node.callee.object.name === 'console' &&
    node.callee.property.type === 'Identifier' &&
    node.callee.property.name === 'log';
}

Then, it's just a matter of constructing the source we want and giving it to update().

code = falafel(code, function(node){
  if (isConsoleLog(node)){
    node.update('console.log(new Date() + ":", ' + 
      node.arguments.map(function(arg){
        return arg.source();
      }).join(', ') + ')');
  }
});

Notice that we use the source() method to get at each argument's original source code - unchanged.

Full Source

Better Assertions

Now, that your feet are wet, let's get into something more involved - let's implement better assertions by using source rewriting.

An assertion is a statement in the program containing a sole boolean expression which is checked during its execution. If the expression is false at that point of program execution, the program will fail with an assertion error. Assertions are most commonly used in automated tests, but can also be used in production code. In fact, using assertions in your production code may have these benefits

Fail erroneous conditions early so that they have minimal negative impact on the state of the application.
Makes post-mortem debugging easier by failing closer to the root cause.

Since Javascript does not have an assert statement, we resort to implementing an assert() function instead

assert(n === 0);

which may be implemented as

function assert(condition){
  if (!condition){
    throw new Error('Assertion failed.')
  }
}

If the assertion fails at this point, we'd get an error like this

Error: Assertion failed.
at assert (simple_assert.js:6:11)
at Object.<anonymous> (simple_assert.js:2:1)
...

this output leaves something to be desired

Currently we have to track down the code that generated the assert statement using the line numbers in the stacktrace. It would be helpful to be able to see the source of the assert statement directly in the error message. Insist is a node modules that does this using some AST hackery.
It would be helpful to see the state of the program at the time the error occured, especially the variables and sub-expressions referenced in the assert statement. Various assertion libraries handle this by giving you access to comparison functions. For example, instead of assert(n === 0) you'd write assert.equal(n, 0), which would yield an error message like:
```
 Assertion Error: Expected 1 to equal 0.
```
but there are still downsides to this approach
- although you see the the values on each side of the comparison, you don't see the names of the variables in the comparisons
- you have to learn and depend on their API - which can be potentially large
- you'll encounter things you want to check which are not in the API or are not comparisons at all, at which point you'd have to fallback to the simple assert form

The Goal

If we add source code rewriting to the toolbox, new possibilities open up. Let's consider how we may want to rewrite an assert statement like

assert(n === 0);

First, to implement just the assertion behavior, we can rewrite the above to

if (!(n === 0)){
  throw new Error('Assertion failed.');
}

Next, we can embed the source of the assert statement into the error message - this satisfies requirement #1

if (!(n === 0)){
  throw new Error('assert(n === 0) failed.');
}

For requirement #2, we want to display the state of the program, but which variables to display? At the minimium, we should display the values of sub-expressions in the assert statement itself. For our example, the sub-expression we care about is the variable n, so we'll display its value in a separate line within the error message.

if (!(n === 0)){
  throw new Error(
    'assert(n === 0) failed.\n' + 
    '  n = ' + n);
}

Let's separate the above into 3 checkpoints:

Basic assertion.
Display assert statement source code in error message.
Display variables and sub-expressions in the assert statement.

Okay, now that we know where we are headed, let's get coding. If you don't want me to ruin the ending for you, stop reading, go implement it yourself and then come back. I'll wait.

Checkpoint #1

First, to detect the assert statement, we'll have an isAssert function

function isAssert(node){
  return node.type === 'CallExpression' &&
    node.callee.type === 'Identifier' &&
    node.callee.name === 'assert';
}

Rewriting it as an if statement then is as easy as this

code = falafel(code, function(node){
  if (isAssert(node)){
    var predicate = node.arguments[0];
    node.update(
      'if (!(' + predicate.source() + ')){ ' +
      'throw new Error("Assertion failed.");' + 
      ' }');
  }
})

For this program as input

var n = 1;
assert(n === 0);

we should get back the rewritten program as

var n = 1
if (!(n === 0)){ throw new Error("Assertion failed."); };

And if we run it we'll get

Error: Assertion failed.

Full Source

Checkpoint #2

To embed the source from the assert statement into the error message isn't much more involved.

code = falafel(code, function(node){
  if (isAssert(node)){
    var predicate = node.arguments[0];
    node.update(
      'if (!(' + predicate.source() + ')){ ' +
      'throw new Error("' + predicate.source() + ' failed.");' + 
      ' }');
  }
})

For the same example as checkpoint #1, we should now get back

var n = 1
if (!(n === 0)){ throw new Error("n === 0 failed."); };

And the error now looks like

Error: n === 0 failed.

Full Source

Checkpoint #3

Displaying variables and sub-expressions in the assert statement's prediate will involve another AST traversal. Let's consider a few cases:

The predicate is a binary expression like n === 0. In this case, we want to show the sub-expressions on both sides, but not literals - because those are already apparent in the source.
The predicate is a function call like isNaN(n). In this case we'll want to drill down into the argument(s) of the function and display its value.
The predicate is a method call like _.isArray(arr). As with case #2, we'll want to display the value of the argument(s).

After several attempts, I've come up with this code which balances cases covered and simplicity

function subExpressions(predicate){
  var exprs = [];
  estraverse.traverse(predicate, {
    enter: function(node){
      if (
        predicate !== node &&   // exclude the original expression itself
        node.type !== 'Literal' // don't want string/number/etc literals
      ){
        exprs.push(node);
      }
      // Skip the nodes under MemberExpression
      // because it's property name as an identifier
      // will not be recognizable as a variable
      if (node.type === 'MemberExpression') this.skip();
    }
  });
  return exprs;
}

Then, it's just a matter of formatting the desired Javascript correctly.

code = falafel(code, function(node){
  if (isAssert(node)){
    var predicate = node.arguments[0];
    var predicateSrc = predicate.source();
    var exprs = subExpressions(predicate);
    node.update(
      'if (!(' + predicateSrc + ')){ ' +
        'throw new Error("' + predicateSrc + ' failed."' + 
          ' + ' + 
          (exprs.map(function(expr){
            var src = expr.source();
            return '"\\n  ' + src + ' = " + ' + src;
          }).join(' + ') || '""') +
        ');' + 
      ' }');
  }
})

Okay, admittedly I've made a bit of a mess here. If you run this on the same example program, you'll now get

Error: n === 0 failed.
  n = 1

Now we are getting somewhere! Here's another example

var obj = {
  name: 'tom',
  getName: function(){
    return 'blah'
  }
}
assert(obj != null)
assert(obj.name === obj.getName())

If you convert this source and run it, you'll get this error:

Error: obj.name === obj.getName() failed.
  obj.name = tom
  obj.getName() = blah
  obj.getName = function (){
    return 'blah'
  }

So helpful!

Full Source

Homework

Not so fast! Did you think you could get away without any homework this time?

In postmortem debugging, any bit of information you can get your hands on may prove valuable for finding the root cause of the problem. So, it seems like a good idea to display the local variables when the assertion has failed. In fact, it may be valuable to display all the variables accessable at that point - in other words the entire variable scope chain. Your mission, should you choose to accept it, is to make the code do exactly this. Good luck!