Changing Code Styles By Source Rewriting

Recently I switched Testem codebase's JavaScript style from a semicolonless style to a semicolonful style. This post will reveal how I did that in the hope that it will be helpful to others. This post doesn't endorse one style or the other, in fact, you can also use the same technique to switch your codebase's style from semicolonful to semicolonless. In fact, whichever side you are on, maybe you should switch for a couple of days to develop empathy for folks whose opinion differ from yours. You can always switch back!

General Procedure

To tackle this problem, I again turned to my handy source rewriting tool - falafel. The reason falafel is awesome is that you can safely rewrite the parts of the code you want to rewrite without touching any of the code that you don't want to change. Source rewriting is usually a trial and error process - at least for me. First you start by parsing some code and looking at its AST, then you decide which types of nodes you want to modify, and you modify them using the .update() method provided by falafel. Please read my falafel introductory article for a proper introduction - which I won't repeat here.

Working a large scale automated code rewrite like this, you risk breaking stuff. Which is why having a test suite is helpful in this case. My procedure would turn out to be:

Write/edit the code transform script.
Apply the script to the codebase.
Run the tests.
If tests fail
- Diagnose problem.
- git reset --hard HEAD
- Go to 1.
If tests pass
- review the code for any visible problems.
- If there are visible problems
  - Come up with fix.
  - git reset --hard HEAD
  - Go to 1.
- If everything looks good, job is done.

Requirements

I had two requirements:

Ensure semicolons are always used, and used properly.
Ensure that there is a space between the closing paranthesis ) and the next open curly bracket { for functions, loops, and if statements.

Adding In The Semicolons

To add semicolons, first I tried simply

var newCode = falafel(code, function(node) {
  if (node.type === 'ExpressionStatement' ||
    node.type === 'ReturnStatement' ||
    node.type === 'VariableDeclaration')
    ensureSemiColon(node);
  }
});

function ensureSemiColon(node) {
  var src = node.source();
  if (src[src.length - 1] !== ';') {
    node.update(node.source() + ';');
  }
}

This worked for the most part, but broke for loops. In the case of for-in loops, this happened:

for (var key; in object){
  ...
}

Oops!

In the case of regular for loops, it was this:

for (var i = 0;; i < arr.length; i++){
  ...
}

So I revised my code to

var newCode = falafel(code, function(node) {
  if (node.type === 'ExpressionStatement' ||
    node.type === 'ReturnStatement') {
    ensureSemiColon(node);
  } else if (node.type === 'VariableDeclaration') {
    if (node.parent.type !== 'ForInStatement' &&
      node.parent.type !== 'ForStatement') {
      ensureSemiColon(node);
    }
  }
});

This passed the tests and all looked good.

Adding That Space For Padding

For this problem, I first tried simply adding a space in front of all block statements.

var newCode = falafel(code, function(node) {
  if (node.type === 'BlockStatement') {
    node.update(node.source() + ' ');
  }
});

While this didn't break the tests, it was overkill because any code that already had a space padding now have two spaces for padding. I ended up using just regex matching:

if (node.type === 'ForStatement' ||
  node.type === 'ForInStatement' ||
  node.type === 'IfStatement' ||
  node.type === 'FunctionDeclaration' ||
  node.type === 'FunctionExpression') {
  var src = node.source();
  src = src.replace(/\)\{/, ') {');
  src = src.replace(/\}else/g, '} else');
  src = src.replace(/else{/g, 'else {');
  node.update(src);
}

Dumb. Simple. Obvious. Works!

That's it! Here is the full source for the code I used to rewrite Testem's codebase, and here is the pull request.