Tuesday, January 27, 2009

Looping in Distributed Programming - Redux - Breaks

In Doug's previous article on distributed loops he wrote about three types types of loop - Independent Loops, Accumulative Loops and Dependent Loops. His article goes deeper, but the basic determining factor is the dependency between each loop iteration. This factor is important to us as it directly translates into the level of parallelism we can exploit, and by implication the level of distribution.

As we've worked outwards we've also also started to find other important categorizations - generally under the (broad) banner of an Accumulative Loop. Our main new category centers on what would be called break in C-based languages - or it's often maligned cousins, continue and goto [1, 2].

Break introduces a control dependency between each iteration. In order to know if iteration [n+1] should execute, we first need to know if iteration [n] has decided to break the loop.

This is a bit of a pain, especially in cases where the loop is otherwise independent. The case we're regularly hitting is a simple search where you're only interested a single result.

For example, you might have a function that allows route-planning - maybe planning a series of flights, or bus trips. Rather than calculate some kind of optimum, in the first case you just want to throw together a quick example that the user then tweaks and edits. This might be useful to check if the route is possible at all, before diving into optimization. In this case you're only interested in the first match (if any).

In these cases I can theoretically evaluate a match against each individual entity in parallel. This issue is that after getting my first match, any subsequent iterations are superfluous.

So what are the options. I can:
  1. Ignore the redundancy and iterate through the whole set - in parallel and distributed.

  2. Build in a control dependency and break the loop when I find a match. This means each iteration must be sequential.

  3. Allow a hybrid - Iterate in batches and allow breaks at the batch level.

Clearly the tradeoff here is the amount of redundant processing against the increase in parallelism. If you assume a simple, evenly distributed search space, on average option #1 will do twice the amount of work of option #2. If you greatly increase the size of the search space, this could end up being a lot of redundant work. However, as the search space increases the available parallelism increases also. Naturally it depends, but arguably you get a better "return" on the parallelism side as this has a number of other benefits - e.g. scale, response time and so forth. If you had infinite resources you would certainly adopt approach #1 and work as parallel as possible.

For option #3 you can make the best of the tradeoff if your batches line up well with your distribution model - for example, if you're CPU bound this might be that the batch size lines up with the number of cores available. You're able to exploit the highest level of parallelism that resources allow, whilst considerably reducing the amount of redundant processing.

It's semantically tricky to come up with a construct that makes this construct explicit in our new distribution language. Calling it "break" is the worst as it's overloading an existing term with different, but subtly similar connotations. What we're really after is a succinct way of introducing a "break-loop-at-your-next-convenience" statement. I've settled on "retire" for the moment, but it still doesn't seem to hit the nail on the head (suggestions welcome).

In terms of the technical implementation - Right now we're semantically catering for #3 by introducing control breaks, but under the hood we're implementing #1 by completely ignoring them. #1 is effectively the degenerate case of #3. Once our distribution technology improves we'll be able to exploit these cues better.

jon@jodoro.com

[1] Continue is in the same class generally, but doesn't apply in the specific case being discussed in this article - if you assume an otherwise independent loop, continue can be constructed using if/else semantics.
[2] e.g. http://www.google.com.au/search?q=break+continue+considered+harmful. Some developers put break in the same camp as continue and goto.

No comments: