This week I learned a little bit of Go. I was fascinated by the power and simplicity goroutines and channels.

With these ideas fresh in my mind, I decided to reproduce Mark C. Chu-Carroll’s Go prime sieve in JavaScript with goroutines and channels as my toolset instead of conventional JavaScript idioms. Find my translation on GitHub.

I chose JavaScript because it shares many traits with Go, such as multiple threads of control, but doesn’t have a notion of channels.

Background: goroutines and channels

A goroutine is an independent concurrent thread of control.

A channel is a mechanism for two concurrently executing functions to communicate by passing a value. A channel has two methods, read and write, which are synchronous by default. That is, a write blocks until there is a read to consume it, and a read blocks until there is a write to consume.

JavaScript implementation

Goroutines are rather trivial to implement, a simple setTimeout will schedule an independent thread of control.

function go (fn) {
  setTimeout(fn, 0);
};

Synchronous channels will require a queue of reader and writer callbacks. I decided to call the reader callbacks before the writers, but I don’t think it matters.

var chan = function () {
  this.readers = []; // [cb, ...]
  this.writers = []; // [[value, cb], ...]
};

chan.prototype.read = function (cb) {
  if (this.writers.length) {
    // consume a writer
    var writer = this.writers.shift();
    cb(writer[0]);
    writer[1](writer[0]);
  } else {
    // queue the reader
    this.readers.push(cb);
  }
};

chan.prototype.write = function (value, cb) {
  if (this.readers.length) {
    // consumer a reader
    var reader = this.readers.shift();
    reader(value);
    cb(value);
  } else {
    // queue the writer
    this.writers.push([value, cb]);
  }
};

I also pass the writer’s value to its callback because it could be useful.

The prime sieve

Mark’s Go sieve begins with an integer generator.

func generate_integers() chan int {
    ch := make(chan int);
    go func(){
        for i := 2; ; i++ {
            ch <- i;
        }
     }();
    return ch;
}

The problem here is that the channel write (ch <- i) blocks inside of a loop. In JavaScript, we "block" by passing a callback that is called once the procedure can continue. Here, the producer function passes itself as the callback to write.

function integers () {
  var ch = new chan();
  go(function () {
    var producer = function (i) {
      ch.write(i + 1, producer);
    };
    ch.write(2, producer);
  });
  return ch;
};

This Go function excludes multiples of a given prime from the given channel.

func filter_multiples(in chan int, prime int) chan int {
   out := make(chan int);
   go func() {
      for {
         if i := <- in; i % prime != 0 {
             out <- i;
         }
      }
    }();
   return out;
}

We can rewrite this with another recursive callback, but this time we only have to block when we do a write.

function filter_multiples (ch, prime) {
  var out = new chan();
  go(function () {
      var consumer = function (i) {
        if (i % prime != 0) {
          out.write(i, function () {
            ch.read(consumer);
          });
        } else {
          ch.read(consumer);
        }
      }
      ch.read(consumer);
  });
  return out;
};

The sieve in Go will chain a series of channels to exclude multiples of all the primes we have seen.

func sieve() chan int {
   out := make(chan int);
   go func() {
      ch := generate_integers();
      for {
	     prime := <- ch;
	     out <- prime;
	     ch = filter_multiples(ch, prime);
      }
   }();
   return out;
}

We achieve the same in JavaScript with another recursive callback.

function sieve () {
   var out = new chan();
   go(function () {
      var ch = integers();
      function iteration () {
        ch.read(function (prime) {
          out.write(prime, function () {
            ch = filter_multiples(ch, prime);
            iteration();
          });
        });
      };
      iteration();
   });
   return out;
};

Mark's program simply reads from the sieve channel and prints out each prime number.

func main() {
  primes := sieve();
  for {
    fmt.Println(<-primes);
  }
}

Mine uses another recursive callback to do the same.

function main () {
  var primes = sieve();
  function iteration() {
    primes.read(function (i) {
      sys.puts(i);
      iteration();
    });
  };
  iteration();
}

main();

Demo

Go-flavored JavaScript: Prime Sieve

Discussion

Note that the call to main() returns (practically) immediately, since the goroutine in sieve(), which hasn't executed, has not yet written anything to the channel, so the read inside iteration just pushes the callback onto a read queue and returns. Once the main thread of control stops, the event loop begins running the goroutines, which drive the rest of the program. (Please correct me if I'm wrong.)

That functions return quickly is generally desirable, in JavaScript or Go, because a caller should not have to wait for a function to do some "hard work" before it returns. Instead, the hard work should occur in a separate thread of control and passed back to the caller by some layer of indirection.

JavaScript programmers typically solve this problem with callbacks which receive result of the hard work. The Go language offers synchronous channels between independent threads of control as a more sophisticated solution. These tools are easily ported to idiomatic JavaScript based on callbacks. Although unconventional, these tools are simple and may be very powerful when composed.

I hope this toy example borrowed from Go inspires JavaScript programmers to consider the unconventional.

 

John Gruber published a helpful JavaScript Bookmarklet Builder which I used to generate my IPA TTS bookmarklet.

A few users reported that Chrome choked on the bookmarklet. I found that the bookmarklet builder does not produce the output that Chrome expects. I was using Chrome 15.0.874.54 beta.

For example, take this simple script.

(function () {
  // what is 1+1?
  alert(["1+1=",(1+1)].join(''));
})()

Gruber’s script produces this bookmarklet: Example 1

javascript:(function%20()%20{alert([%221+1=%22,(1+1)].join(%27%27));})()

If you drag this example to your bookmark bar in Chrome, it will produce this error when invoked.

Uncaught SyntaxError: Unexpected token %

What happened? If you edit the bookmark, you will see that Chrome double-escaped the string,

javascript:(function%2520()%2520%7Balert(%5B%25221+1=%2522,(1+1)%5D.join(%2527%2527));%7D)()

I believe this happened because the string contained some non-escaped characters such as parentheses.

Gruber points out in a footnote that his script intentionally leaves some characters unescaped for the purpose of readability.

It’s unclear to me what characters must be escaped in a bookmarklet URL. Some sources suggest that other punctuation characters, such as brackets and semicolons, ought to be escaped, too, but I can see no practical reason to do so. If you want to be really conservative and escape just about everything, change this line:

uri_escape_utf8($bookmarklet, qq('" \x00-\x1f\x7f-\xff));

to:

uri_escape_utf8($bookmarklet);

Personally, I prefer to keep the bookmarklet URL itself as readable as possible.

With this change, we get this bookmarklet: Example 2

javascript:%28function%20%28%29%20%7Balert%28%5B%221%2B1%3D%22%2C%281%2B1%29%5D.join%28%27%27%29%29%3B%7D%29%28%29

It may not be as readable as “Example 1″, but it works.

Find my modifications to Gruber’s JavaScript Bookmarklet Builder on GitHub

Let’s say we have a Google App Engine model with two fields, “x” and “y”, and all objects are distinct.

We’d like to paginate with the sort order criteria “x ascending, y ascending”. For example, that index might look like this.

While paginating, we need to know whether there is a “next page”, because we don’t want to return an empty page unless we have no objects.

To do this, we’ll look-ahead for the next item in our search. So, if the page size is 2, we’ll just fetch 3 items. That extra object will give us the necessary (x, y) coordinates we need to resume the search, that is, to fetch the next page. This object is called the “bookmark”.

Given the (x, y) bookmark, how do we fetch the next page? In index terms, we might construct a prefix string “/$x/$y/”, and return the next rows after that in our index. Unfortunately App Engine doesn’t give us a low-level interface like that, so we have to use filters.

What kind of filters? We want all objects greater than or equal to our given “x”. But, for objects with the same value as “x”, we have to skip those less than our given “y”, because we have already seen them. Clearly we will need filters on both “x” and “y”, but woe, App Engine insists that no more than one distinct field can have a filter expression per query.[1]

Ryan Barrett observed that for a search with n sorted fields, you will need n queries to resume the search from a bookmark. In other words, you need one query for each field.

What does this look like? In our example, we have two sorted fields. To resume the search, the first query will fix “x” and filter on “y”, with “y ascending”. The second query will filter on “x” with “x ascending, y ascending”.

What would that look like on our example index?

Each shaded region is one query, that is, one trip to the datastore. The green objects are the current page. The orange object is the next bookmark. The blue objects are the old pages.

The first query is on a fixed “x” and filters “y”. The second query resumes where the first left off, and filters on “x”. Since the sort orders are consistent with the filters, we obtain a consistent view of the index.

On the fourth search, we happen to get lucky and find our bookmark on the first query, so we can skip the second query.

In general

If you’re sorting on n fields {f0, f2, … fn-1} you will need n queries {q0, q1, … qn-1}.

Query qi will filter fn-i and fix fields {f0, … fn-i-1}.

For n=2, query q0 filters f1 and fixes f0. Query q1 filters f0.

Why don’t you use cursors you slob?

Cursors are great! Except, they don’t immediately answer the question, “Do we have a next page?” If you don’t care then by all means use cursors, they’re much easier to use than bookmarks.

Cursors in App Engine can’t look-ahead. Instead, you have to fetch one object with the query’s next cursor, rather than fetching an extra object in the original query.

On the other hand, cursors are allowed to perform queries which users cannot. Cursors can perform the “prefix string” trick described above. So, a cursor can do in one query what takes us two or more.

So which should you use? That probably depends on how many fields you are sorting on. For a single field, I prefer the look-ahead technique, because it only requires one trip to the datastore. For two fields, either technique requires (at most) two trips, although cursors are usually simpler. For more than two fields, cursors easily win.

[1] This requirement exists to prevent users from requesting filters which the datastore cannot satisfy by scanning consecutive rows in an index. For example, the filters “x > 2 and y > 3″ can specify non-consecutive rows of an index that sorts “x” and “y” ascendingly.

Tagged with: