A Mock for {set,clear}{Timeout,Interval}

Posted by Oliver on April 20, 2008

Here’s a potential JSSpec spec for Sequentially.trickle.map:

describe('Sequentially.trickle.map', {
  'should apply to all the elements': function() {
    Sequentially.trickle.map(
      ['a', 'b', 'c'],
       function(x) { return x + 1 },
       1,
       function(result) {
         value_of(result.join(',')).should_be('a1,b1,c1');
       });
    });
  }
});

This doesn’t work. The problem is that Sequentially.trickle.map is asynchronous (it defers most of its computation — including the invocation of the callback — via setTimeout). This means that should_be isn’t called until after the spec has returned. If it succeeds, this isn’t a problem, but if it fails, JSSpec can’t associate it with the failing spec — worse, JSSpec will have already have marked it successful.

Here’s the version that I actually used:

describe('Sequentially.trickle.map', {
  'should apply to all the elements': function() {
    withMockTimers(function() {
      Sequentially.trickle.map(
        ['a', 'b', 'c'],
         function(x) { return x + 1 },
         1,
         function(result) {
           value_of(result.join(',')).should_be('a1,b1,c1');
         });
      });
    }
});

withMockTimers temporarily replaces setTimeout and friends with its own deferred execution system, so that it can make sure to call them all before it returns. Get it here.

Limitations

This approach has its limits. It doesn’t mock new Date to pretend that more time has passed, so whether it works or not will depend on how your code uses the timers (if it keeps an interval running or re-submitting a timeout until an amount of time measured on new Date has passed, it will probably get a “script running slowly” error). And, I don’t know how kosher it is to replace setTimeout — this let me test against Firefox 2.0 and Safari 3.1; I haven’t tried on Opera and MSIE. Nonetheless, it got me what I wanted here — unit tests for the new methods in Sequentially.

(I’ve got a more involved implementation that patches the test suite to run callbacks within an emulation of the dynamic scope of the original test function, but it’s tricky, and I haven’t got it integrated with JSSpec — or any other browser JavaScript implementation — yet.)

Conquering the Busy Cursor with Sequentially

Posted by Oliver on April 20, 2008

What’s wrong with this function? (Hint: it’s meant to execute periodically on a JavaScript page.)

function updateExpirationText() {
  var now = new Date;
  products.forEach(function(item) {
    var expiresDate = item.expiresDate || Date.parse(item.expires),
        remaining = expiresDate - now,
        text = remaining < 0 ? 'expired' : msToDuration(remaining);
    $('item-' + item.id + ' .time-remaining').text(remaining);
  });
}

It’s a trick question. Maybe nothing’s wrong. But if products can get very long, or if the msToDuration is very slow, you’ve locked up the UI for a long time. At best, this makes for sluggish response; at worst, the page that contains this will trigger a “script running slowly” error, and the user will likely abort all the JavaScript on the page.

If this computation only needs to run once, and when (or before) the page loads, you can do it on the server. But often a computation depends on some aspect of the client state, that isn’t known when the page is requested. In this example, the computation depends on the current time (and the current time keeps changing). In another case, the computation might depend upon the values of some controls or other widgets on the page — if we’ve gone all AJAXy, and want to show the user an instant response, even if that means some client-side computation.

Here’s an alternative to the function above, that doesn’t lock up the page. It uses Sequentially.trickle.forEach, a new function in Sequentially. This function walks its second argument over some span of the first argument — up until 250ms has passed, in this case — and then sleeps for a frame (via setTimeout) before waking up to walk over the next span, until all is done. This gives time back to the browser (and to other setTimeout and setInterval threads), and avoids the “script running slowly” error. Note the one-line change: "products.forEach("” becomes "Sequentially.trickle.forEach(products,".

function updateExpirationText() {
  var now = new Date;
  Sequentially.trickle.forEach(products, function(item) {
    var expiresDate = item.expiresDate || Date.parse(item.expires),
        remaining = expiresDate - now,
        text = remaining < 0 ? 'expired' : msToDuration(remaining);
    $('item-' + item.id + ' .time-remaining').text(remaining);
  }, 250);
}

Sometimes you need to run some code after the iteration is done. In other words, sometimes you need to transform a function that looks like this:

  var startTime = new Date;
  array.forEach(function(item) { ... });
  console.info(new Date - startTime, 'elapsed');

(Here, the code that runs after the iteration just reports how long the iteration took.)

You can do that with a continuation function (or callback), the same as you would with an AJAX request:

  var startTime = new Date;
  Sequentially.trickle.forEach(array, function(item) { ... }, 250, k);
  function k() {
    console.info(new Date - startTime, 'elapsed');
  }

JavaScript being lexically scoped, you can refer to all the same variables from a nested function (@k@).

There’s a Sequentially.trickle.map too. Since the return value can’t contain the function application results yet when the function returns, you have to get them back from the callback. Before (the synchronous version):

  var results = array.map(function(item) { ... });
  console.info('Results:', results);

and after (asynchronous):

  var startTime = new Date;
  Sequentially.trickle.map(array, function(item) { ... }, 250, k);
  function k(results) {
    console.info('Results:', results);
  }

DB Content Rails Plugin 2

Posted by Oliver on April 17, 2008

The DB Content Rails plugin adds tasks to save and restore database content.

Usage

-- dump the development database to db/archive/development-content.sql.gz
rake db:content:dump

-- load the dumped database, and apply any necessary migrations
$ rake db:content:load

-- dump the production database to db/archive/production-content.sql.gz
$ RAILS_ENV=production rake db:content:dump

-- save the development database to db/archive/{timestamp}.sql.gz
$ rake db:content:save

-- save the (compressed) database to my-data.sql.gz
$ rake db:content:save FILE=my-data.sql.gz

-- save the (uncompressed) database to my-data.sql
$ rake db:content:save FILE=my-data.sql

-- load the database from my-data.sql
$ rake db:content:load FILE=my-data.sql

Tasks

rake db:content:archive

Saves a timestamped database to db/archive/{timestamp}.sql.gz.

rake db:content:dump

Dumps the database to FILE or db/{RAILS_ENV}-content.sql.gz. If FILE ends in .gz, the file is compressed.

rake db:content:load

Loads the database from FILE or db/{RAILS_ENV}-content.sql.gz, and migrates it to the current schema version. If FILE ends in .gz, the file is piped through gunzip.

Installation

git clone git://github.com/osteele/db_content.git vendor/plugins/db_content

If you’re running off Edge Rails (or, presumably, Rails > 2.0.2), you should be able to do this instead:

script/plugin install git://github.com/osteele/db_content

Limitations

The plugin works only with the MySQL databases. (It adds methods to the Mysql adaptor; see the source.) The gzip option probably only works on *nix (MacOS, Linux, etc.).

JCON: Ruby Gem for JSON type conformance

Posted by Oliver on April 17, 2008

JCON (the JavaScript Conformance gem) tests JSON values against ECMAScript 4.0-style type definitions
(PDF) such as string?, (int, boolean), or [string, (int, boolean), {x:double, y:double}?].

Usage

type = JCON::parse "[string, int]"
type.contains?(['a', 1])     # => true
type.contains?(['a', 'b'])   # => false
type.contains?(['a', 1, 2])  # => true

JCON also defines an RSpec matcher, conforms_to_js:

[1, 'xyzzy'].should conform_to_js('[int, string]')
[1, 2, 'xyzzy'].should_not conform_to_js('[int, string]')  # 2 isn't a string
{:x => 1}.should conform_to_js('{x: int}')

Use JCON together with the JavaScript Fu Rails plugin to test the argument values to functions in generated JavaScript:

# this will succeed if e.g. response contains a script tag that includes
#   fn("id", {x:1, y:2}, true)
response.should call_js('fn') do |args|
  args[0].should conform_to_js('string')
  args[1].should conform_to_js('{x:int, y:int}')
  args[2].should conform_to_js('boolean')
  # or:
  args.should conform_to_js('[string, {x:int, y:int}, boolean]')
end

Whence

Github for the sources.

Rubyforge for docs.

gem install jcon to install.

License and version

MIT License, of course.

JCON is at version 0.1 because it’s just a few days old and I had to guess about the ECMAScript 4.0 type syntax from the examples in the overview. I can’t imagine that I got everything right.

Three Small JavaScript Libraries 3

Posted by Oliver on April 15, 2008

Three small libraries, that I carry with me from project to project:

Fluently — Construction Kit for Chainable Methods

With Fluently, you can do this:

    var o = Fluently.make(function(define) {
      define('fn1', function() {console.info('called fn1')});
      define('fn2', function() {console.info('called fn2')});
      define('fn3', function() {return 3});
    });

to define an object with chained methods, that can be invoked thus:

  o.fn1().fn2() // calls fn1 and then fn2
  o.fn2().fn1() // calls fn2 and then fn1
  o.fn1().fn3() // returns 3 (an explicit 'return' breaks the chain)

You can also define modifiers, and aliases:

    var o = Fluently.make(function(define) {
      define('fn1', function() {console.info('called fn1')});
      define('fn2', function() {console.info('called fn2')});
      define.empty('and');
      define.alias('fn3', 'fn1');
      define.modifier('not');
    });
 
  o.fn3(); // same as o.fn1()
  o.fn1().and.fn2() // same as o.fn1().fn2()
  o.fn1().and.not.fn2() // options.not is set when fn2 is called

I used this to build a mock and spec construction kit. I don’t use Fluently to define the mocks; I use it to define the methods that define the mocks. Doing all this in one library made my head hurt, so I factored this part of it out.

Git Fluently from here.

MOP JS

MOP JS defines utilities for JavaScript metaprogramming. You don’t think you need it until you try asynchronous programming, where some methods don’t have enough information to operate until the response to another method’s asynchronous request have returned.

  MOP.delegate(target, propertyName, methods)

For each name in methods, defines a method on target with this
name, that delegates to the method of the propertyName property
of target with the same name.

  new MOP.MethodReplacer(object, methods)

When a new MethodReplacer is constructed, it replaces each method
on object by the method in methods with the same key value, if
such a method exists. A MethodReplacer has a single method,
restore, which restores each method to its pre-replacement
value.

  new MOP.QueueBall(object, methodNames)

When a new QueueBall is constructed, it replaces each method named
by methodNames with a method that enqueues the method call (the
name of the method and its arguments). A QueueBall has a single
method, replayMethodCalls, which plays back the method calls and
restores the methods.

  MOP.withMethodOverridesCallback(object, methods, fn)

Calls fn on object, within a dynamic scope within which the
methods in methods have temporarily replaced the like-named
methods on object. The scope is terminated by the argument to
the call to fn; this argument should be treated as a
continuation, and restores the methods.

  MOP.withDeferredMethods(object, methodNames, fn)

Calls fn on object, within a dynamic scope within which the
methods in methodNames have been enqueued. The scope is
terminated by the argument to the call to fn; this argument
should be treated as a continuation, and ends the queue, replaying
the methods.

See the specs for examples; git MOP JS from here.

Collections JS

Finally, the Collections library defines framework-independent JavaScript collection methods, for use in browser JavaScript and in ActionScript / OpenLaszlo. There are many libraries like this; this one is mine.

The Array and String methods extend the class prototype; the Hash methods use a proxying wrapper to avoid prototype pollution. The methods with the same names as the ECMAScript 1.6+ extensions have the same spec as those; the ones with the same name as prototype extensions have the same spec as those in the Prototype library; and there’s a few odds and ends such as String#capitalize.

I use this when I don’t want the overhead of Prototype, or want to use these functions in an environment that Prototype doesn’t run on, such as OpenLaszlo. It has some overlap with Functional, but isn’t nearly so radical — this can be an advantage.

Git Collections JS from here.

JavaScript Fu Rails Plugin 3

Posted by Oliver on April 14, 2008

JavaScript Fu extends Rails with a few facilities to better integrate JavaScript into Rails development:

1. The notes and statistics rake tasks compass JavaScript files in the public/javascript directory:

$ rake notes
public/javascripts/controls.js:
  * [782] [TODO] improve sanity check
$ rake stats
| Name                 | Lines |   LOC | Classes | Methods | M/C | LOC/M |
[...]
| JavaScript           |  7287 |  6322 |       0 |       0 |   0 |     0 |
[...]

2. The call_js RSpec matcher asserts that a string or response contains a script tag, that contains JavaScript that calls the named function or method:

response.should call_js('fn')
response.should call_js('fn(true)')
response.should call_js('gApp.setup')

If you pass a block to call_js, it’s called back with the argument list, parsed as though it were a JSON array:

# matches <script>fn(1, 'aString', {x:10,y:20})< /script>
response.should call_js('fn') do |args|
  args[0].should == 1
  args[1].should == 'aString'
  args[2].should == {:x => 10, :y => 20}
end

Use this with jcon to test for type conformance, using ECMAScript 4.0 type definitions. (Well, you can’t use it with jcon yet, because I haven’t released it — this is just a teaser. But you can peek.)

response.should call_js('fn') do |args|
  args[0].should conform_to_js('[Array, (int, boolean)]')
  args[1].should conform_to_js('{x: double, y: double}')
  # or just:
  args.should conform_to_js('[[Array, (int, boolean)], {x: double, y: double}]')
end

3. The page.onload page generator method generates code that executes the content
of the block upon the completion of page load:

page.onload do
  page.call alert', 'page loaded!'
end

These lines generate one of these (depending on whether the jRails plugin has been loaded):

Event.observe("window", "load", function() { alert("page loaded!"); });
$(document).ready(function() { alert("page loaded!"); });

Gitting It

JavaScript Fu is hosted on git. If you have git installed, you can clone it into your Rails directory thus:

git clone git://github.com/osteele/javascript_fu.git vendor/plugins/javascript_fu

If you’re running off Edge Rails (or, presumably, Rails > 2.0.2), you should be able to do this instead:

script/plugin install git://github.com/osteele/javascript_fu

Otherwise, you can simply download the tarball from here.

Update: changed the conform_to_js example so that it actually works with the (albeit unreleased) plugin..

FlashBridge: proxying Flash <-> OpenLaszlo

Posted by Oliver on April 13, 2008

I’ve updated my OpenLaszlo utility grab-bag to make browser <-> applet communication even easier. How easy?

Proxies

Put this in your browser JavaScript:

var gObject = {
  f: function() { console.info('gObject.f', arguments) },
  g: function() { console.info('gObject.g', arguments) }
};

And this in an OpenLaszlo applet:

var gObject = FlashBridge.createRemoteProxy('gObject', ['f', 'g']);
gObject.f(1, 2);
gObject.g(3);

When you run the applet code, it prints this to the browser console:

gObject.f [1,2]
gObject.g [3]

That’s right, Flash is invoking the function calls, but they’re executing in the browser.

Now switch these around — put the first block in the applet, and the second block in the browser JavaScript — and it still runs the same way, except that it’s the browser that invokes the functions, and they run in the applet (and print to the OpenLaszlo debug console, if the applet was compiled with debugging on).

(By the way, the full sources for the examples are here.)

Return Values

Callbacks, or continuations for return values, make it easy for the applet to operate on the return value from a call into the browser, even though these calls are asynchronous.

Put this in the browser:

var gService = {
  add: function(a, b) {
    logCall('gBrowserObject.add', arguments);
    return a+b;
  },
  error: function(msg) {
    logCall('gBrowserObject.error', arguments);
    throw msg;
  }
};

And this in the applet:

gBrowserObject.add(1, 2).onreturn(function(value) {
  console.info('1 + 2 -> ' + value);
});
gBrowserObject.error('error msg').onexception(function(value) {
  console.info('error !> ' + value);
});

The argument to onreturn is called (asynchronously) with the return value. The argument to onexception is called with the message from the exception, if an exception occurred.

Callbacks, unlike proxies, only work one direction — for calls from the applet to the browser. That’s not for a technical reason — I’ve just only needed it one direction so far.

Call Storage

Browser code can call into the applet even if the applet hasn’t initialized yet, and vice versa.

To implement this, each side of the bridge stores calls (and return value handlers) in a mailbox until it hears back that the other side has loaded. Once this happens, the mailboxes are flushed and the remote call methods switch to direct invocation.

This works around a couple of race conditions. First, the applet won’t generally have run its initialization code by the time the browser receives its load event, so a naive implementation of the bridge wouldn’t allow the browser to make calls into the applet until the browser had heard back that the applet had loaded — which is hard to detect. (It isn’t enough to wait for the object’s onload event, because this can trigger before the first frame of the movie plays, so the applet may still not have initialized enough to receive messages.) Conversely, depending on your page organization and initialization raindance, the applet might load before page side has registered — so the applet couldn’t call into the page until an unknown time.

Security Implications

FlashBridge, by default, allows the browser to call anything sitting in the applet, and vice versa. This increases the attack surface of your application, because it allows an embedded Flash applet to invoke any part of it. This means that an XSS can tunnel through your applet to gain access to any site with a crossdomain.xml file that allows your applet to connect to it — something that XSS on a pure JavaScript page can’t do.

It you prefer not to audit your application against this, you can call FlashBridge.secure to prevent it from accepting arbitrary calls, and then FlashBridge.register to register callins.

There’s no lockdown facility in the other direction — to lock down the browser JavaScript against calls from the Flash application. That’s because it’s trivial for a Flash application to invoke arbitrary JavaScript in the browser context — in fact, that’s how the applet -> browser communication is implemented, and if that were secured at the FlashBridge layer, the vulnerability would still be accessible one layer down.

Gitting It

All this is in the LzOsUtils project on GitHub, with examples here. Download it via the Download button, clone it via git clone git://github.com/osteele/lzosutils.git, or add it as a submodule to an existing git repo via git add submodule git://github.com/osteele/lzosutils.git.

FlashBridge is written for OpenLaszlo, but would probably run in straight Flash too. And it uses my own funky alternative to ExternalInterface for calling from Flash to the browser (since the built-in API is seriously broken), but it could be ported to run on top of Dojo or something pretty easily.

FizzBuzz Station 4

Posted by Oliver on February 28, 2008

Uh oh! I overthought fizzbuzz:

Continue reading…

Synchronizing Client Models 6

Posted by Oliver on February 27, 2008

You’re implementing a client-server application. The client is in JavaScript. It contains a model class, Person. The model is backed by a server-side Person model, and a REST controller at /person. Periodically, the client updates the server’s model, but there can be client-side instances that don’t yet exist on the server, such as when a model is first created and the server hasn’t yet gotten the message.

I’ve written this code a few times now, in JavaScript, and in ActionScript. if If you write it the obvious way, you run into an interesting set of race conditions. Here’s the code, and the race conditions, and some ad-hoc solutions. In the next post, I’ll introduce a metaobject pattern, queue ball, that I’ve used to solve these race conditions in a more principled and re-usable fashion.

Note: As of 2008-02-28, none of this code has been tested. It’s all extracted from code that’s like the code here, but I haven’t copied and pasted these specific examples into an execution environment, which probably means they fail.

Getting Personal

Here’s the model, with some server proxy mojo mixed in:1

// creates a client-only instance
function Person(attributes) {
  this.attributes = attributes||{};
  // if a server mirror exists, this.id is set to its id 
}
 
// creates a client instance that is mirrored by a new server instance
Person.create = function(attributes) {
  var person = new Person(attributes);
  person.create();
  return person;
}
 
Person.prototype = {
  // creates a server instance for this client instance
  create: function() {
    jQuery.post('/person/create', this.attributes, function(data) {
      this.id = data.id;
    }.bind(this)); 
  },
 
  //  updates attributes of this instance, and, if it exists, its server mirror
  update: function(attributes) {
    Hash.merge(this.attributes, attributes);
    this.id && jQuery.post('/person/update/' + this.id, attributes);
  },
 
  // deletes this instance's server mirror
  remove: function() {
    this.id && jQuery.post('/person/delete', {id:this.id});
    delete this.id;
  }
}

This implementation uses jQuery for transport, and assumes a Hash.merge method from some collection library (say, Prototype’s). It creates a class by setting prototype directly, and it doesn’t detect or recover from XHR errors. All these choices are just to have something concrete to write about; they don’t affect the substance of this article.

A Day at the Races

Do you see the race conditions? There’s at least three: create+update, create+delete, and update+update.

Race Condition 1: Create then Update

function createThenUpdate() {
  var aPerson = Person.create();
  aPerson.update({name:'Edgar Dijkstra'});
}

The problem with createThenUpdate is that aPerson won’t have an id by the time update is called, so update won’t send the new values to the server. The call to create is synchronous, but the communication with the server, and therefore the call to the callback (that sets aPerson.id) is asynchronous, and therefore won’t occur until Person.create returns.

In detail:

  1. createUpdate calls Person.create
  2. Person.create calls new Person
  3. aPerson.create calls jQuery.post
  4. jQuery.post calls XMLHttpRequest.send (not shown)
  5. XMLHTTPRequest.send, jQuery.post, and aPerson.create return
  6. createUpdate calls aPerson.update
  7. [time passes]
  8. Client sends HTTP Request to server
  9. [more time passes]
  10. Client receives HTTP Response
  11. Callback in aPerson.create sets aPerson.id

Solution 1: Explicit Callbacks

One solution to this problem is to thread the code through callbacks (in effect, performing CPS conversion by hand). aPerson.create calls a callback function once it’s internal callback function is called, so Person.create takes a callback parameter too, and so on up the call chain. (In this case, the buck stops here.)

Let’s add a callback parameter to Person.create, that is called once the HTTP response to /person/create is received.

Person.create = function(attributes, callback) {
  var person = new Person(attributes);
  person.create(callback);
  return person;
}
 
Person.prototype = {
  // creates a server instance for this client instance
  create: function(callback) {
    jQuery.post('/person/create', this.attributes, function(data) {
      this.id = data.id;
      callback && callback();
    }.bind(this)); 
  }
}

Then we can rewrite createThenUpdate thus:

function createThenUpdate() {
  var aPerson = Person.create({}, function() {
    aPerson.update({name:'Edgar Dijkstra'});
  });
}

Adding the UI

It was easy to spot the race condition in createThenUpdate — and easy to fix it — because the calls to create and the update were in consecutive statements, within the same function. In the real world, they’re at the bottom of different call chains, as in this jQuery code that binds some model actions to an HTML view:2

$('#person create-button').click(function() {
  $(this).disable(); // avoid double-creation
  $('#person update-button').enable();
  gCurrentModel.create();
});
$('#person update-button').click(function() {
  gCurrentModel.update($('#person').serialize());
});

Click “create“, edit a field, and then click “update“. Sometimes the update will hit the server, sometimes it won’t: it depends on whether the response to the /person/create request has returned by the time you click the second button. We’ve just created an AJAX version of the 500-mile bug.

Let’s thread the callbacks through this code, in order to avoid enabling the “update” button until the callback is called:

$('#person create-button').click(function() {
  $(this).disable(); // avoid double-creation
  gCurrentModel.create({}, function() { $('#person update-button').enable() });
});
$('#person update-button').click(/* unchanged */);

This is awful! First, it requires you to weave callbacks through both your view and your model code.3 But worse, it’s a leaky abstraction. The view layer has to know about an arbitrary (from the outside) limitation — that you can’t call update until create has called its callback — of the model layer.

Solution 2: Implicit Callbacks

Another solution is to use a library such as Narrative JavaScript or JavaScript Strands, that does the CPS conversion (adds the callbacks) for you. I like this approach a lot, but I do a lot of work in contexts where those compilers aren’t applicable4, and many folks (often including, for these reasons and others, me) prefer to work in pure JavaScript. I therefore won’t go further down that path here.

Solution 3: Action Queue

Finally, we can add a queue to the model. With the modification below, calling update while the model is waiting for an id no longer drops server updates; it simply queues them for playback once the response to /person/create is received.

Person.prototype = {
  _updateQueue: null,
 
  create: function() {
    this._updateQueue = [];
    jQuery.post('/person/create', this.attributes, function(data) {
      this.id = data.id;
      while (this._updateQueue.length)
        this._sendUpdate(this._updateQueue.shift());
      delete this._updateQueue;
    }.bind(this));
  },
 
  // the caller must treat `attributes` as deep-frozen once
  // this method has been called
  update: function(attributes) {
    Hash.update(this.attributes, attributes);
    if (this.id)
      this._sendUpdate(attributes)
    else if (this._updateQueue)
      this._updateQueue.push(attributes);
  },
 
  _sendUpdate: function(attributes) {
    jQuery.post('/person/update/' + this.id, attributes);
  }
}

We can use a “method algebra” to optimize this a bit: It doesn’t matter how many times update is called while waiting for the create response — it only needs to send an update once. (The algebra is that there’s an operation +: update × updateupdate that can combine consecutive updates update1 + update2 = update3.)

Person.prototype = {
  _pendingUpdates: null,
 
  create: function() {
    this._pendingUpdates = {};
    jQuery.post('/person/create', this.attributes, function(data) {
      this.id = data.id;
      if (this._pendingUpdates) {
        this._sendUpdate(this. _pendingUpdates);
        delete this. _pendingUpdates;
      }
    }.bind(this));
  },
 
  update: function(attributes) {
    Hash.update(this.attributes, attributes);
    if (this.id)
      this._sendUpdate(attributes)
    else if (this._pendingUpdates)
      Hash.merge(this._pendingUpdates, attributes);
  },
 
  _sendUpdate: function(attributes) {
    jQuery.post('/person/update/' + this.id, attributes);
  }
}

I’m going to back off from this optimization, though. The reason is that it only works if the two calls to update are consecutive — when there are no intervening calls that also send messages that operate on the same instance. With a more full-featured API (with more actions that send messages to the server), this won’t generally be true.

For example, let’s extend Person with a setPermissions method. If we could ignore race conditions, this method might look like this:

Person.prototype = {
  _pendingUpdates: null,
 
  setPermissions: function(permissions) {
    this.permissions = permissions;
    this.id && jQuery.post('/person/set_permissions', {id:this.id, permissions:permissions});
  }
}

This naive implementation is vulnerable to a create+setPermissions race condition analogous to the create+update race condition that we just fixed, though. We can fix them both by generalizing the post-create queue, so that it can contain arbitrary actions, not just update records:

Person.prototype = {
  _pendingActions: null,
 
  create: function() {
    this._pendingActions = {};
    jQuery.post('/person/create', this.attributes, function(data) {
      this.id = data.id;
      while (this._pendingActions.length) {
        var action = this._pendingActions.shift();
        this[action.methodName].apply(this, action.arguments);
      }
      delete this._pendingActions;
    }.bind(this));
  },
 
  update: function(attributes) {
    Hash.update(this.attributes, attributes);
    if (this.id)
      this._sendUpdate(attributes);
    else if (this._pendingActions)
      this.pendingUpdates.push({methodName:'_sendUpdate', arguments:[attributes]);
  },
 
  _sendUpdate: function(attributes) {
    jQuery.post('/person/update/' + this.id, attributes);
  },
 
  setPermissions: function(permissions) {
    this.permissions = permissions;
    if (this.id)
      this._sendSetPermissions(permissions);
    else if (this._pendingActions)
      this.pendingUpdates.push({methodName:'_sendSetPermissions', arguments:[permissions]);
  },
 
  _sendSetPermissions: function(permissions) {
    jQuery.post('/person/set_permissions', {id:this.id, permissions:permissions});
  }
}

Race Condition 2: Create then Delete

function createThenDelete() {
  var aPerson = Person.create();
  aPerson.delete();
}

By now, you should be able to spot the problem here. The reasoning is exactly the same as for update: when delete is called, aPerson won’t yet have an id.

We could fix this with a callback:

function createThenDelete() {
  var aPerson = Person.create({}, function() {
    aPerson.delete();
  });
}

This has the attendant disadvantages of having to bake knowledge about the client-server protocol into Person’s clients, and having to thread callbacks through the UI. After all, it’s rare that we would create a Person simply to delete it; the more common case is that the creation and deletion would be at the bottom of different call chains — often initiated from outside the application, in response to user actions — such that it’s difficult to thread the first as a callback of the second. And note that, as with create+update, we can’t simply ignore the delete unless the server creation has responded: if we do this, we’ll occasionally drop a delete on the floor, because it was called after the create was sent, but before the response.

The best local solution is to build on the action queue solution above — by simply adding another method to the queue.

Person.prototype = {
  delete: function() {
    if (this._pendingActions)
      this.pendingUpdates.push({methodName:'_sendDelete');
    else
      delete this.id;
  },
 
  _sendDelete: function() {
    jQuery.post('/person/delete', {id:this.id});
    delete this.id;
  }
}

This works, but it should make you uncomfortable. We’re adding (almost) the same conditional to every single method.

Race Condition 3: Overlapping Updates

function updateThenUpdate(aPerson) {
  aPerson.update({name:'Edgar Djikstra'});
  aPerson.update({name:'Edgar Dijkstra'});
}

From looking at updateThenUpdate, it looks like the first call to update will occur before the second. And it does! (Duh.) And it looks like the misspelled name in the first call will be replaced by the correct name in the second call. And it will! (Well…on the client…read on.) Because: the first call to XMLHttpRequest.send (with the misspelled name) occurs before the second call to XMLHttpRequest.send (with the correction), and the client therefore sends the message with the misspelled name before it sends the message with the correction. But our run of good luck stops here. There is, unfortunately no guarantee about the order in which the server will receive these messages. Generally, the first message will be received before the second. Sometimes, they will arrive in the other order, and the misspelling will overwrite the correction.

There are two ways to fix this problem: by sequencing messages, or by holding outgoing messages (holding each outgoing message until the previous one returns). Sequencing messages is the higher-performance solution (it doesn’t hold up messages), but requires more work and involves switching both the client and the server from a straight REST API, which may not be possible5.

For simplicity, we’ll look at the second solution: holding outgoing messages. This solution has the advantage that the general-purpose solution to the other race conditions (presented in the next article) happens to implement it too. (In this article, we’ll implement with an explicit Serialized object instead.) Message sequencing doesn’t help with those other cases at all: the problem with them is that the second message is never sent, not that it’s sent out of order.

Here’s a quick-and-dirty implementation of the hold outgoing messages solution. The following code defines Serialized.post as a drop-in replacement for jQuery.post, that refuses to post data until the previous post has completed (successfully, or with an error).6

var Serialized = {
  queue: [], // arguments for pending 
  defer: false,
  post: function(url, data, callback, type) {
    if (this.defer) {
      this.queue.push(Array.prototype.slice.call(arguments, 0));
      return;
    }
    this.defer = true;
    jQuery.ajax({url:url, type:'POST', data:data, success:success, complete:complete.bind(this)});
    function complete() {
      if (this.queue.length)
        this.post.apply(this, this.queue.shift();
      this.defer = false;
    }
  }
}

Next Up: Queue Ball

I’d like to factor all those conditionals out of the Person methods. Then I’d like to extract the queue code from create, so that I can use it on update (to solve the update+update problem). Finally, there are some general-purpose techniques here, so I’d like to extract the whole mess from Person, where I can apply it to any model (or to code that has some of the same concerns, even if it’s not synchronized model code). But this post is already long enough, so I’ll just close with the promise to write that up, so that I have to do it.


1 Would you rather have code with a cleaner separation of concerns? Here it is. You’ll find that it doesn’t make the race conditions go away, but that it doesn’t change the set of techniques for solving them. (It does make the “explicit callbacks” solution even worse.) I’ve therefore stuck with the double-duty Person implementation in the body of this article, to make the code easier to follow.

function Person(attributes) {
  this.attributes = attributes || {};
  this.proxy = null;
}
 
Person.prototype = {
  create: function() {
    this.proxy = new PersonProxy();
    this.proxy.create(this.attributes);
  },
 
  update: function(attributes) {
    Hash.merge(this.attributes, attributes);
    this.proxy && this.proxy.update(attributes);
  },
 
  remove: function() {
    this.proxy.remove();
    delete this.proxy;
  }
}
 
function PersonProxy() {
  this.id = null;
}
 
PersonProxy.prototype = {
  create: function(attributes) {
    jQuery.post('/person/create', attributes, function() { this.id = id }.bind(this)); 
  },
 
  update: function(attributes) {
    this.id && jQuery.post('/person/update/' + this.id, this.attributes);
  },
 
  remove: function() {
    this.id && jQuery.post('/person/delete', {id:this.id});
    delete this.id;
  }
}

2 This implementation somewhat mixes the model with the view. It’s not the clearest code. It would be cleaner if it used listeners and reactive programming techniques — but the fact that it’s so explicit makes it easier to follow what’s going on.

3 I’ve used this approach, and it wipes the floor with using listeners or delegates or other unthreaded callbacks, where you have to store state in objects in order to match listeners with their context, but it’s still a pain to maintain.

4 CPS conversion introduces a lot of function allocations and invocations. I’ve been scared to try a system that introduces them globally, instead of letting me judiciously thread a few callbacks in by hand, when developing for a slow ECMAScript implementation such as Flash < 9 or MSIE. (I even use my own libraries sparingly in such a situation.)

5 XMPP preserves message order, by sending all the messages over a single stream. One could also add a sequence number to each message. The receiver (in this case, the server) should buffer messages that arrive out of order, so that it can process them in the order in which they occur. This is how a streaming protocol such as TCP is implemented: by adding sequence numbers and buffering on top of an unordered protocol such as IP. HTTP is implemented on top of TCP, but only uses TCP to preserve the order of packets within a message, so multiple HTTP requests (and responses) can get out of order again. It seems that keepalive might fix the problem, and that load balancers might re-introduce it, and that affinity might fix it again, but only if you can guarantee that your load balancer is properly configured. But I’m getting out of my depth here.

6 This code assumes that a request will never take longer than the client timeout setting to reach the server. Otherwise, complete could be called before the server receives the first message, the client would send the next message, and the server would process them out of order. That’s one reason I called this implementation quick-and-dirty.

More Monads on the Cheap: Inlined fromMaybe 8

Posted by Oliver on February 26, 2008

This article is about how to deal with null values. It follows up on this one. It’s intended for code stylists: people who care a lot about the difference between one line of code and two, or keeping control statements and temporary variables to a minimum. (A code stylist is kind of like the dual of a software architect, although one person can be both.) It’s not about code golf — although you might learn some strokes to use on that — but about keeping the structure of your code, even at the expression level, close to the way you think about the problem, if you think like me.

If you’re not a code stylist — and I’m not saying that being a code stylist, any more than being a prose stylist, is either a good or a bad thing — you might find it baffling that someone would put so much time into such simple topic. I won’t try to convince you otherwise. In that case, you might want to check back next week, when I’ll move back up to the bigger picture. (Specifically, some fun stuff involving how to use meta-object programming to solve race conditions in client-server models.)


A nullable or optional, type is one that might have a value of a certain basis type, but might be null. For example, a nullable array is either an array or null. Even if you don’t use a language with type declarations, you probably use a language with types. If a variable or field (JavaScript property) is expected to hold only arrays, it has type array; if it sometimes ends up holding null as well, it has type nullable array instead.

Haskell has a function fromMaybe that turns a nullable type into a non-nullable type, but replacing it with a default value when it’s null. What would this look like in a more conventional language, and where would you use it?

I’m using JavaScript as an example language here, but the techniques here apply to Ruby and, to a lesser extent, Python as well.

The First Problem Set

Here’s your assignment. It has three parts. In all of them, products is a list of products . In JavaScript, this list is represented by an instance of Array.

First, if products is non-empty, display its first item; otherwise, do nothing. This is easy enough:

if (products.length) {display(products[0])}

Or, for a more Lisp- or Ruby-like style, with the advantage that it can be nested in an expression:

products.length && display(products[0])

Second, apply a preload() function to each item in products. This is easy too:1

products.forEach(preload)

Finally, extract the id from each product, and pass the list of ids to a function preloadAll.2

preloadAll(products.pluck('id'))

Raising the Bar

Let’s make this problem harder. This time, products might be an array, but it might be null.

“Hey!” you (ought to) protest. “That’s a stupid design. You’re giving me poorly typed data, and this introduces complexity and its attendant costs (development time, code size, test cases) to deal with it.”

Well, yes. But this is the real world. Maybe you’re reading an attribute from a deserialized XML element. XML schemas allow for this kind of abbreviation, and using it makes documents more concise (and therefore both lower bandwidth and easier to inspect for debugging), so you’ll probably see this at some point. Maybe you’re reading or a property from a JSON object, where the server omits null lists (for the same reasons — message size and debuggability — as for XML). Or maybe you’re reading products from a library that represents empty lists by null — for performance reasons (to avoiding making empty lists), or for backwards compatibility, or just out of laziness. I’ve seen all of the cases, a number of times.

Or maybe you used the technique in Monads on the Cheap to write something like (order||{}).products. Now that you’ve propogated a null order into a null products — to avoid wrapping an if statement around the code that dealt with order — you’ve got to pay the piper. You followed my advice and I dug you into a hole; now I’d better toss you a rope ladder.

Solution 1: Fixing the input on entry

You could fix products before you use it: insert products = products || [] at the top of your function to change a nullable array into a non-null array, by replacing null by a default value. If products is a local variable (as opposed to a function parameter), you could even do this where it’s initialized: replace var products = order.products, say, by var products = order.products || [].3 So the three solutions above become simply:

// products may be null
products = products || [];
// products instanceof Array
if (products.length) {display(products[0])}
 
products = products || [];
products.forEach(preload)
 
products = products || [];
preloadAll(products.pluck('id'))

Raising the Bar Again

Where products is a local variable, “fixing the input” really is the best solution. However, it’s not the most general solution (for reasons I’ll get to). So I’ll move the bar again.

This time, instead of the variable products, it’s the expression offer.products that evaluates to the nullable array. What’s the smallest change required to adapt our code to null values, in this scenario?

Solution 2: Introducing a temporary variable

This one looks absurdly easy too. Changing a line of code to accommodate nullable arrays involves a simple program transformation. Replace offer.products by products, and insert var products = offer.products || [] on the line before. Here are the before cases, where offer.products is not allowed to be null:

// requires products instanceof Array
if (products.length) {display(offer.products[0])}
offer.products.forEach(preload)
preloadAll(offer.products.pluck('id'))

And here are the after cases, where offer.products is allowed to be null:

// accepts null products
var products = offer.products || [];
if (products.length) {display(products[0])}
 
var products = offer.products || [];
products.forEach(preload)
 
var products = offer.products || [];
preloadAll(products.pluck('id'))

Non-local Transformations

There’s something funny about the “temporary variable” program transformation. offer.products is an expression — you can nest it in another expression: as the argument to a function, before a property accessor, or as part of a conditional. var products = offer.products||[]; ...; ...(products)... is a statement sequence. In fact, it’s a statement sequence with a hole — it doesn’t strictly embed the original code, but it isn’t strictly embedded by it, either; instead, it’s woven in.

These differences — that this transformation changes the syntactic type of the code that you’re applying it to (from an expression to a statement), and that you have to weave it into the existing code — make it non-local.4 Here’s what I mean by this:

To apply this transformation — to change code that expects an array into code that accepts a null — we look for an occurrence of offer.products; we replace it by products; and then we travel up the syntax tree (we look at the expression that contains offer.products, and then the expression that contains that) until we find a statement. Finally we insert var products = offer.products||[] before that statement.

Admittedly, there hasn’t been much to this in the statements so far. We’ve simply replaced the first snippet below (with one line of code) by the second snippet (with two lines). And the lines are adjacent, so it’s easy enough to read them as a unit.

// requires products instanceof Array
preloadAll(offer.products.pluck('id'))
// accepts null products
var products = offer.products || [];
preloadAll(offer.products.pluck('id'))

It gets worse, though. Let’s make offer nullable too, and add some more computation. (I’m trying to get offer.products far enough inside of an expression that we can get a feel for where the problems arise.)

In the first block below (which doesn’t deal with nullable arrays), offer is either an object with a products property (which is always an Array), or null. If it’s not null, we load its products. We then set loaded to true if there was an offer, and if any of its products were already loaded. (preloadAll returns true in this case.) Simple enough:

// accepts null offer; requires products instanceof Array
var loaded = offer && preloadAll(products.pluck('id'));

Now, how do we change this if not only offer, but offer.products, are nullable? We apply the transformation above, inserting the statement var products = ... and changing offer.products to products. But where do we insert the statement? It has to go before the call to preloadAll, but after the test for whether offer is null.5 But in the listing above, there isn’t any such location!

To create one, we have to split the initialization expression in half, in order to get the test for offer and the use of offer.products into separate statements, so that there will be room for a statement (not added yet) between them:

// accepts null offer; requires products instanceof Array
var loaded = false;
if (offer)
  loaded = preloadAll(offer.products.pluck('id'));

And now we can hoist offer.products out of the second new statement, without moving it before the first:

// accepts null offer, offer.products
var loaded = false;
if (offer) {
  var products = offer.products || [];
  if (preloadAll(products.pluck('id'))
    loaded = true;
}

This is awful! Not only did it go from one line to five, but loaded changed from a non-mutable variable with an initializer into a mutable variable with two different assignments, such that its initialization is split across the inside and the outside of a conditional. This is the kind of code that, if I let it take over 5% of my program, takes up 50% or my debugging time.

You might think these problems are because of the distinction between statement and expression in Algol-style languages (C, C++, Java, JavaScript). This is partly right, but it’s only somewhat better in Lisp-style languages (Smalltalk, Lisp itself, Ruby). Here’s a straight translation of the JavaScript code into Ruby:

// before: accepts offer == nil; requires offer.products.is_an? Array
loaded = offer && preloadAll(products.map &:id));
# after: accepts offer == nil, offer.products == nil
loaded = false
if offer
  products = offer.products || []
  loaded = preloadAll(producs.map &:id)
end

Now let’s use the fact that Ruby statements are expressions to re-write the second case:

# also accepts offer == nil, offer.products == nil
loaded = offer && preloadAll(begin products = offer.products || []; products.map &:id));

Sure, this is back down to one line. And it avoids the cascading rewrite of the first transformation (where changing the innermost expression into a statement required changing the expression that contains it into a statement). However it’s far from idiomatic Ruby.

Worse, like the JavaScript snippet (and this is another problem with temporary variables), it introduces a products into the surrounding environment, or overwrites the existing value of products if there’s already a variable with that name there — a subtle source of bugs, especially when you apply this transformation more than once.

Solution 2: Ifs and Ands

You could use conditional statements to transform the original solutions from these:

// requires non-null product
if (offer.products.length) {display(offer.products[0])}
offer.products.forEach(preload)
preloadAll(offer.products.pluck('id'))

into these:

// accepts null product
if (offer.products && offer.products.length) {display(offer.products[0])}
if (offer.products) offer.products.forEach(preload)
if (offer.products) preloadAll(offer.products.pluck('id'))

The first line (if (products) {...}) already had a conditional, so we stuck the new test into the existing conditional. The other two lines didn’t, so we wrapped the original code in new conditionals to hold the new test.

Scalability

The “ifs and ands” solution works, but it doesn’t scale. (”Doesn’t scale” is Enterprise for “I don’t like it.” In this case, I’ll rationalize define “scale” as “grows linearly and compositionally with the number of variables and/or the complexity of the syntactic context”.)

First, like the “temporary variable” solution, it’s non-local — it involves changing an expression into a statement, and therefore the expression that contains that expression into a statement, and so on up the line.

It’s also non-linear (in the sense of linear logic6, not linear algebra). Where an expression occurs once in the original code, it occurs twice in the new code. It evaluates offer.products three times instead of twice in the first case, and twice instead of once in the other two.

To see why this is bad, imagine if instead of offer.products it were offer.getProducts, or customer.offer.products, or ((customer||{}).offer||{}).products. Or imagine if it were an expression that had side effects — then the first example wouldn’t have worked anyway, but we would have just broken the other two.

To get another taste of how the expressions replicate with this strategy, take a look at what happens when here’s more than one of them. What if there are two such properties, offer.products and offer.merchants, and we only want to execute our code if they’re both non-empty? Here’s the case for non-nullable arrays:

// offer.products and offer.merchants are non-null
if (offer.products.length && offer.merchants.length) {...}

This code transforms into this:

// offer.products and offer.merchants may each be null
if (offer.products && offer.products.length &&
    offer.merchants && offer.merchants.length) {...}

Or let’s say we wanted to sum the lengths of two properties, offer.ordered and offer.saved. The code for the non-nullary case is simply offer.ordered.length + offer.saved.length. The nullary case balloons into a statement sequence:

// offer.products and offer.merchants may each be null
var count = 0;
if (offer.ordered) count += offer.ordered.length;
if (offer.saved) count += offer.ordered.length;

Or we could use the ternary operator, but still at the cost of typing (and evaluating) each nullable subexpression twice:

// offer.ordered and offer.saved can each be null
(offer.ordered ? offer.ordered.length : 0) : (offer.saved ? offer.saved.length : 0)

The problem with all of these is that we’ve had to write out each variable name twice, inviting defects. In fact, I made a paste-o in one of the examples above. I could fix it, but I bet I’d make it again if I later changed the code to include offer.wishlist in the count.

Solution 3: Inlined fromMaybe

Here’s an alternative. To change code that expects a non-nullable array to a nullable array, change array to array||[]. This is a local transformation: it changes an expression into an expression (not a statement), so you don’t need to re-write the code that contains it. It’s also a linear transformation (again, in the sense of linear logic, not linear algebra): an expression that only occurs once before the transformation, only occurs once after it.

The original solutions transform thus:

// offer.products can be null
if ((offer.products||[]).length) {...}
(offer.products||[]).forEach(...)
if ((offer.products||[]).length || (offer.saved|[]).length) {...}

Note that each transformation is local: no new control structures are introduced, so there’s no cascade of expression -> statement transformations up the syntax tree. We can see that by the fact that the troublesome loaded case remains largely intact.

// offer and offer.products can each be null
var loaded = offer && preloadAll(offer.products||[]).length);

Here’s the summation code:

// offer.ordered and offer.saved can each be null
count = (offer.products||[]).length || (offer.saved||[]).length;

Beyond Arrays and JavaScript

This technique works in any language where arbitrary values can be used in a boolean context (that is, practically every language except Java) and where null is considered false, and for any type whose values test true. This includes Object and Array in JavaScript, additionally Number and String in Ruby (since 0 and “” are considered true in that language), and — well, the moral equivalent of Object types in Python, since Python collections implement nonzero() or len.

But actually we can go ahead and use the technique even with types that contain a false value, where we want to replace that false value by a default anyway (either the same false value, or a different value that tests as true). For example, even though JavaScript "" tests false, we can use inputString || "" to coerce a nullable string to a non-null string, since it will null and "" are the only two values that it will change to ""

Here are some examples that go beyond arrays. First, using the ternary operator. (Which isn’t so bad here, since the expression is in a variable already — bear with me and pretend the expressions are more complex):

var count = products ? products.length : 0; // the original example: an array
var value = inputString ? parseInt(inputString, 10) : 0; // string
var option = options ? options.key : 'default'; // Object used as dictionary
var result = fn ? fn(argument) : defaultValue; // Function
sprite.moveTo(x ? x : 0, y ? y : 0); // number

And now, using the inlined fromMaybe technique, in JavaScript, Ruby, and Python:

// JavaScript
var count = (products || []).length;
var value = parseInt(inputString || '0', 10);
var option = ({key:'default'}.key;
var result = (fn || Function.K(defaultValue))(argument);
sprite.moveTo(x || 0, y || 0);
# Ruby
count = (products || []).length
value = (inputString || '0').to_i
option = (options || {:key => 'default'})[:key]
sprite.move_to(x || 0, y || 0)
# Python
count = (products or []).length
value = int(inputString or '0')
option = (options or {'key': 'default'})['key']
sprite.moveTo(x or 0, y or 0)

The Real Thing

For reference, here’s how these examples look in Haskell.

let count = length (fromMaybe [] products)
let value = read (fromMaybe "0" inputString)
let option = lookup (fromMaybe [["key", "default"]] options) "key"
moveTo sprite (fromMaybe 0 x) (fromMaybe 0 y)

This is fairly unidiomatic Haskell. You can do a lot better, by modifying the functions instead of the values:

let count = maybe 0 length products

Scala also has a nullable type (Option), with a getOrElse method.

val count = (products getOrElse List()).length

Although, as with Haskell, you’d write this differently in idiomatic Scala:

val count = products.map(_.length) getOrElse 0


1 forEach was added in JavaScript 1.6, and works in Firefox. You can get cross-browser implementations from Dean Edwards or the Mozilla Developer Center; or with frameworks such as the JQuery (in the jQuery collection plugin), Prototype (where it’s called each), or MochiKit (where it’s a top-level function).

2 anArray.pluck is from Prototype. In pure JavaScript 1.6 (or another library that extends JavaScript with the 1.6 collection functions), this would be products.map(function(product) { return product.id }). Or in Functional, it’s map('_.id', products)

3 In conjunction with monads on the cheap, in the scenario where products might be null because order might be null, the code looks like this: var products = (order||{}).products || []. In fact, this is simply an extension of monads, where the default value is the empty array, instead of {}.

4 This disruption is in addition to the fact that now you’ve got to come up with a variable name (usually easy), and make sure that if you do this to two different pieces of code in the same scope you use two different variables (harder), and hold a larger set of variable names in your head when you’re reading this code a year later (hardest).

5 In this particular case, we could instead use the cheap monads idiom (offer||{}).products. But not every embedding expression is a test for nullity.

6 Linear logic is just a system where you can’t replace once occurrence of an expression by two. I didn’t link to the wikipedia page in the text because it’s written for logicians, not programmers, and makes it look scary-complicated, but here it is. Failing linearity is what goes wrong in C macros.