INTRODUCTION
Hashlife is a radically powerful way of computing Conway's Game of
Life. It was invented by Bill Gosper who originally described it in
his 1984 paper "Exploiting Regularities in Large Cellular Spaces",
Physica D (Nonlinear Phenomena) volume 10 pp. 75-80.
Three weeks ago I wrote about
Life In A Register which was a tangent from work I was doing on hashlife.
I have not made much progress on it since then, so I thought I would post what I have
done so far here, because it is the most interesting part. It's a literate-programming
description of the algorithm, optimised for clarity of exposition, not speed.
This post is (mostly) mechanically generated from the
code I have written so far.
Hashlife uses several cunning techniques:
The universe is represented as a quadtree of nested squares that
are powers of two cells on a side. There is an intricate and
beautiful recursive function for computing life using this data
structure.
It avoids constructing multiple copies of equivalent portions of
the quadtree that are duplicated at different places or times by
looking up previously-constructed squares in the eponymous hash
table. (This technique is known as memoizing or hash-consing.) It also
remembers the results of Life computations so that it does not need
to repeat them.
Because of its aggressive memoization it is extremely effective at
computing Life patterns that have some kind of repetition in space
or time. Hashlife can take the same length of time to compute from
generation N to generation 2N for any large enough N - it has
logarithmic complexity. Its disadvantage is that it is not very
fast at handling random Life soup.
There are a few hashlife resources on the web. There is
David
Bell's implementation, and
Tomas Rokicki's implementation is part of the "Golly" Game of Life
simulator. He also wrote about
it for Dr Dobb's: "An Algorithm for Compressing Space and Time".
For an example of hashlife's capabilities, it can compute 2^130
generations of the Metacatacryst producing a pattern of 2^232
cells.
I first read about hashlife when working on my previous (more
conventional) Life implementation.
CORE DATA STRUCTURE AND ALGORITHM
First, a little bit of notation. Because hashlife can handle
exceptionally large computations, we represent generation counts,
population counts, display co-ordinates, etc. as multiprecision
integers. Powers of two are particularly important, so because of
the cost of creating mpz_t objects we pre-allocate a few of them.
With any luck this upper bound will be overkill.
I'll also use this array for notation in the comments, so
pow2[0] == 1, pow2[1] == 2, pow2[8] == 256, etc.
#include <stdbool.h>
#include <gmp.h>
typedef unsigned char byte;
#define LEVEL_MAX 1024
static mpz_t pow2[LEVEL_MAX];
static void
init_pow2(void) {
mpz_init_set_ui(pow2[0], 1);
for(unsigned i = 1; i < LEVEL_MAX; i++) {
mpz_init(pow2[i]);
mpz_mul(pow2[i], pow2[i-1], pow2[i-1]);
}
}
Hashlife divides the universe up into a quadtree. A square at level
N is pow2[N] cells on a side, and divided into four level N-1
quarters with half as many cells on a side, i.e. pow2[N-1]. We do
not need to store the level explicitly since we can keep track of
it by counting depth of recursion.
The code is written consistently to handle sub-squares in reading
order named by compass points, as shown in this declaration. The
quarter pointers are never NULL except in level 0 squares. The calc
field can be NULL or a pointer to a level N-1 square; it is filled
in lazily. We'll have much more to say about it below.
typedef struct square {
struct square *nw, *ne,
*sw, *se,
*calc;
mpz_t *pop;
} *square;
There are two level 0 squares, each being one cell that is live or
dead. We don't represent this state explicitly, but instead just use
the identities of the squares, so they don't need initialization.
Read "sq_0" as "square level 0".
sq_0[0] sq_0[1]
+ + + +
()
+ + + +
#define NUM_SQ_0 (1 << 1*1)
static struct square sq_0[NUM_SQ_0];
This function returns 0 for a dead cell and 1 for a live cell.
The pointer arithmetic is equivalent to (sq == &sq_0[1]).
static inline bool
alive(square sq) {
return(sq - sq_0);
}
The level 1 squares have a level 0 square (a cell) in each quarter,
so they are 2*2 cells in size and have pow2[2*2] = 16 possible
states. The state of each cell is represented by a pointer to
sq_0[0] or sq_0[1].
#define NUM_SQ_1 (1 << 2*2)
static struct square sq_1[NUM_SQ_1];
static void
init_sq_1(void) {
for(unsigned i = 0; i < NUM_SQ_1; i++) {
unsigned nw, ne, sw, se;
nw = (i & 1) >> 0;
ne = (i & 2) >> 1;
sw = (i & 4) >> 2;
se = (i & 8) >> 3;
square sq = &sq_1[i];
sq->nw = &sq_0[nw];
sq->ne = &sq_0[ne];
sq->sw = &sq_0[sw];
sq->se = &sq_0[se];
}
}
A level 2 square similarly comprises four level 1 squares, so it
has 16 cells. Level 2 squares are interesting because they are the
first squares that are big enough to calculate the next generation.
We can calculate the next state of one of the four inner cells
based on its eight surrounding cells as shown below:
+--+--+--+--+ +--+--+--+--+ +--+--+--+--+ +--+--+--+--+
| | | | | | | | | |
+ +--+ + + + + +--+ + +--+--+--+ + + +--+--+--+
| |()|()| | | |()|()| | | () ()| | | |() () |
+ +--+ + + + + +--+ + + +--+ + + + + +--+ +
| () ()| | | |() () | | |()|()| | | |()|()| |
+--+--+--+ + + +--+--+--+ + +--+ + + + + +--+ +
| | | | | | | | | |
+--+--+--+--+ +--+--+--+--+ +--+--+--+--+ +--+--+--+--+
The state of a cell in the next generation is determined by its
current state and the number of live neighbours.
static bool
life_rule(square nw, square nn, square ne,
square ww, square cc, square ee,
square sw, square ss, square se) {
unsigned count;
count = alive(nw) + alive(nn) + alive(ne)
+ alive(ww) + alive(ee)
+ alive(sw) + alive(ss) + alive(se);
return(count == 2 ? alive(cc) : count == 3);
}
However we do not have enough information to calculate the next
states of the outer cells because our focus is currently too narrow
to see how the surrounding pattern might affect them.
+--+--+--+--+--+
| | |??|
+ + + +--+ +
| ()|()| |??|
+ + + +--+ +
| ()|() |??|
+ + +--+--+--+
| |
+--+--+--+--+
All we can say, given a level 2 square, is that we can compute the
state after one generation of a level 1 square aligned to the same
centre. We will see in a moment how to use this as the basis for
calculating the futures of bigger squares.
We pre-compute the result of the calculation on level 2 squares and
store it in the square's calc field for re-use later. In larger
squares this field is only filled in when necessary, but in the
level 2 case it is always filled in so that it acts as a sentinel
to save us from explicitly testing for the base case of the
recursion.
For reference when reading the double dereferencing,
here's a diagram of how the constituent level 1
sub-squares and level 0 sub-sub-squares
within a level 2 square are laid out.
+--+--+--+--+
|nw ne|nw ne|
nw + + + + + ne
|sw se|sw se|
+--+--+--+--+
|nw ne|nw ne|
sw + + + + + se
|sw se|sw se|
+--+--+--+--+
#define NUM_SQ_2 (1 << 4*4)
static struct square sq_2[NUM_SQ_2];
static void
init_sq_2(void) {
for(unsigned i = 0; i < NUM_SQ_2; i++) {
unsigned nw, ne, sw, se;
nw = (i & 0x000F) >> 0;
ne = (i & 0x00F0) >> 4;
sw = (i & 0x0F00) >> 8;
se = (i & 0xF000) >> 12;
square sq = &sq_2[i];
sq->nw = &sq_1[nw];
sq->ne = &sq_1[ne];
sq->sw = &sq_1[sw];
sq->se = &sq_1[se];
// Calculate Life on the four possible 3x3 groups:
unsigned calc = 0;
calc |= life_rule(sq->nw->nw, sq->nw->ne, sq->ne->nw,
sq->nw->sw, sq->nw->se, sq->ne->sw,
sq->sw->nw, sq->sw->ne, sq->se->nw) << 0;
calc |= life_rule(sq->nw->ne, sq->ne->nw, sq->ne->ne,
sq->nw->se, sq->ne->sw, sq->ne->se,
sq->sw->ne, sq->se->nw, sq->se->ne) << 1;
calc |= life_rule(sq->nw->sw, sq->nw->se, sq->ne->sw,
sq->sw->nw, sq->sw->ne, sq->se->nw,
sq->sw->sw, sq->sw->se, sq->se->sw) << 2;
calc |= life_rule(sq->nw->se, sq->ne->sw, sq->ne->se,
sq->sw->ne, sq->se->nw, sq->se->ne,
sq->sw->se, sq->se->sw, sq->se->se) << 3;
sq->calc = &sq_1[calc];
}
}
Given the pre-calculated results for level 2 squares, how can use
them to calculate the next generation of a level 3 square? We can't
just recurse on its four quarters because that leaves gaps between
the results:
+--+--+--+--+--+--+--+--+
| |
+ +--+--+ + +--+--+ +
| | | | | |
+ + + + + + + + +
| | | | | |
+ +--+--+ + +--+--+ +
| |
+ + + + + + + + +
| |
+ +--+--+ + +--+--+ +
| | | | | |
+ + + + + + + + +
| | | | | |
+ +--+--+ + +--+--+ +
| |
+--+--+--+--+--+--+--+--+
To solve this problem the level 2 sub-squares need to overlap so
that their smaller level 1 results abut. We achieve this by re-
arranging the level 1 sub-sub-squares to make five extra level 2
sub-squares. We then have a total of nine overlapping level 2
squares. (These diagrams are half-size.)
+--+--+--+--+ +--+--+--+--+ +--+--+--+--+
| | | | | | | | | |
+ + + + + + + + + + + + + + +
| nw| | | | n | | | |ne |
+--+--+ + + + +--+--+ + + + +--+--+
| | | | | |
+ + + + + + + + + + + + + + +
| | | | | |
+--+--+--+--+ +--+--+--+--+ +--+--+--+--+
+--+--+--+--+ +--+--+--+--+ +--+--+--+--+
| | | | | |
+--+--+ + + + +--+--+ + + + +--+--+
| | | | | | | | | |
+ + w+ + + + + c + + + + +e + +
| | | | | | | | | |
+--+--+ + + + +--+--+ + + + +--+--+
| | | | | |
+--+--+--+--+ +--+--+--+--+ +--+--+--+--+
+--+--+--+--+ +--+--+--+--+ +--+--+--+--+
| | | | | |
+ + + + + + + + + + + + + + +
| | | | | |
+--+--+ + + + +--+--+ + + + +--+--+
| sw| | | | s | | | |se |
+ + + + + + + + + + + + + + +
| | | | | | | | | |
+--+--+--+--+ +--+--+--+--+ +--+--+--+--+
We're going to need these nine sub-squares in a couple of places so
let us define helper functions for finding them. The main find()
function (which I have not written yet) looks up a square in the
hash table based on its four constituent sub-squares, or constructs
a new square if it was not found. For reference, here's the
sub+subsub diagram again.
+--+--+--+--+
|nw ne|nw ne|
nw + + + + + ne
|sw se|sw se|
+--+--+--+--+
|nw ne|nw ne|
sw + + + + + se
|sw se|sw se|
+--+--+--+--+
static square find(square nw, square ne, square sw, square se);
static inline square find_nw(square sq) {
return(sq->nw);
}
static inline square find_nn(square sq) {
return(find(sq->nw->ne, sq->ne->nw,
sq->nw->se, sq->ne->sw));
}
static inline square find_ne(square sq) {
return(sq->ne);
}
static inline square find_ww(square sq) {
return(find(sq->nw->sw, sq->nw->se,
sq->sw->nw, sq->sw->ne));
}
static inline square find_cc(square sq) {
return(find(sq->nw->se, sq->ne->sw,
sq->sw->ne, sq->se->nw));
}
static inline square find_ee(square sq) {
return(find(sq->ne->sw, sq->ne->se,
sq->se->nw, sq->se->ne));
}
static inline square find_sw(square sq) {
return(sq->sw);
}
static inline square find_ss(square sq) {
return(find(sq->sw->ne, sq->se->nw,
sq->sw->se, sq->se->sw));
}
static inline square find_se(square sq) {
return(sq->se);
}
When we calculate the next generation of each of these nine
overlapping level 2 sub-squares, we get nine abutting level 1
squares like this:
+--+--+--+--+--+--+--+--+
| |
+ +--+--+--+--+--+--+ +
| | | | | |
+ + n w + n n + n e + +
| | | | | |
+ +--+--+--+--+--+--+ +
| | | | | |
+ + w w + c c + e e + +
| | | | | |
+ +--+--+--+--+--+--+ +
| | | | | |
+ + s w + s s + s e + +
| | | | | |
+ +--+--+--+--+--+--+ +
| |
+--+--+--+--+--+--+--+--+
We can calculate a second generation by combining these into four
overlapping level 2 squares and then working out their results.
This gives us four abutting level 1 squares as follows. I have
labelled each of these new level 1 squares with the four previous
level 1 squares that were used to make their parent level 2
squares. The repeated labels indicate where the overlap occurs.
+--+--+--+--+--+--+--+--+
| |
+ + + + + + + + +
| |
+ + +--+--+--+--+ + +
| |nw nn|nn ne| |
+ + + + + + + + +
| |ww cc|cc ee| |
+ + +--+--+--+--+ + +
| |ww cc|cc ee| |
+ + + + + + + + +
| |sw ss|ss se| |
+ + +--+--+--+--+ + +
| |
+ + + + + + + + +
| |
+--+--+--+--+--+--+--+--+
Finally we combine these level 1 squares to make another level 2
square. We could use this to calculate a third generation, but we
do not because after two generations the work we have done starting
from a level 3 square is twice as big in every dimension - both
space and time - as the level 2 calculation. This gives us a
perfect recursive structure, which is the key to the algorithm.
level 2 level 3 level N
input size 4 * 4 8 * 8 pow2[N] * pow2[N]
output size 2 * 2 4 * 4 pow2[N-1]*pow2[N-1]
generations 1 2 pow2[N-2]
In Life, the "speed of light", the maximum speed that a signal can
propagate, is one cell per generation. This is the rate at which
the behaviour of surrounding squares can affect more and more of
the cells around the edge of our current square as time passes.
Eventually after pow2[N-2] generations our focus is too narrow to
say anything about a border pow2[N-2] cells wide, and we only know
about the middle square pow2[N-1] cells across. (This is why a
square's calc field points to a level N-1 square, just like its
quarter fields.) Hence, the ziggurat (truncated pyramid) shape
formed by hashlife's recursion is the "past light cone" of the
result square, that is, it includes all the cells in past
generations that can possibly affect the state of the cells in the
result.
The calculation function itself is memoized, i.e. it avoids repeating
work by storing its result in the square. This makes the overlapping
recursive structure feasible, because without the memoization it would
lead to enormous amounts of recomputation. Another way of looking at
this is lazy evaluation of the calc field, i.e. call-by-need, where we
explicitly force evaluation of the calc field by calling calc2(). The
combination of maximal sharing of equivalent squares (enabled by find()'s
hash table), memoization of results, and divide-and-conquer
recursion is what gives hashlife its amazingly fast logarithmic
complexity for patterns that have repeating behaviour.
The memoization is where the pre-computed level 2 calc results act
as sentinels, since the test for a memoized result always succeeds
at level 2 and terminates the recursion.
static square
calc2(square sq) {
// Check for a memo or a sentinel.
if(sq->calc)
return(sq->calc);
// Find level N-1 sub-squares and calculate level N-2 results.
square
nw = find_nw(sq), nn = find_nn(sq), ne = find_ne(sq),
ww = find_ww(sq), cc = find_cc(sq), ee = find_ee(sq),
sw = find_sw(sq), ss = find_ss(sq), se = find_se(sq);
nw = calc2(nw); nn = calc2(nn); ne = calc2(ne);
ww = calc2(ww); cc = calc2(cc); ee = calc2(ee);
sw = calc2(sw); ss = calc2(ss); se = calc2(se);
// Construct intermediate overlapping level N-1 squares
// and calculate their level N-2 results.
nw = calc2(find(nw, nn, ww, cc));
ne = calc2(find(nn, ne, cc, ee));
sw = calc2(find(ww, cc, sw, ss));
se = calc2(find(cc, ee, ss, se));
// Memoize the result.
sq->calc = find(nw, ne, sw, se);
return(sq->calc);
}
What if we want to calculate a number of generations that isn't a
power of two? We keep exactly the same spatial geometry (relative
sizes and positions of input and output squares) as for the full
calculation, so that the recursion also has the same shape. The
simplest case of calculating fewer generations is zero, in which
case we just construct a sub-square centred on the original square.
In fact we already have code to do this in find_cc().
In the more general case the desired generation count is between 0
as worked out by find_cc(), and pow2[N-2] as worked out by calc2().
There are three kinds of recursion depending on whether the target
falls before, on, or after the midway generation pow2[N-3]. This
point in time corresponds to the age of the nine intermediate
results in calc2(), which is also a dividing point in its code
and in calc1() below.
recursion in first part or second part
0 < gen < pow2[N-3] calc1 find_cc
gen = pow2[N-3] calc2 find_cc
pow2[N-3] < gen < pow2[N-2] calc2 calc1
In calc1() we test the type of recursion for each part separately,
and the three cases arise from the way the two tests overlap when
cmp == 0. In the second part we have to adjust the generation count
so that it is relative to the midway point, and we must restore its
value before returning so that it appears unchanged to the caller.
(We manipulate its value in-place because that is more efficient
than allocating and freeing a local mpz_t with the required value.)
The following code assumes that 0 < gen < pow2[N-2]. Again the base
case of the recursion is not explicit. The level argument must not
be less than 3, in which case the assumption implies that gen must
be 1 which is equal to the halfway point. Therefore no more
recursion via calc1() occurs, and the calls to calc2() return
immediately because of the level 2 sentinels.
This function is not memoized, but instead relies on calc2() for
efficiency.
static square
calc1(square sq, unsigned level, mpz_t gen) {
// Number of generations to the midway point.
mpz_t *half = &pow2[level - 3];
int cmp = mpz_cmp(gen, *half);
level -= 1;
square
nw = find_nw(sq), nn = find_nn(sq), ne = find_ne(sq),
ww = find_ww(sq), cc = find_cc(sq), ee = find_ee(sq),
sw = find_sw(sq), ss = find_ss(sq), se = find_se(sq);
if(cmp < 0) {
nw = calc1(nw, level, gen);
nn = calc1(nn, level, gen);
ne = calc1(ne, level, gen);
ww = calc1(ww, level, gen);
cc = calc1(cc, level, gen);
ee = calc1(ee, level, gen);
sw = calc1(sw, level, gen);
ss = calc1(ss, level, gen);
se = calc1(se, level, gen);
} else {
nw = calc2(nw); nn = calc2(nn); ne = calc2(ne);
ww = calc2(ww); cc = calc2(cc); ee = calc2(ee);
sw = calc2(sw); ss = calc2(ss); se = calc2(se);
}
if(cmp > 0) {
mpz_sub(gen, gen, *half);
nw = calc1(find(nw, nn, ww, cc), level, gen);
ne = calc1(find(nn, ne, cc, ee), level, gen);
sw = calc1(find(ww, cc, sw, ss), level, gen);
se = calc1(find(cc, ee, ss, se), level, gen);
mpz_add(gen, gen, *half);
} else {
In the following code, if we followed the pattern and
called find_cc() where the code above calls calc1(), it
would require up to eight allocations. We can reduce this
maximum to just four by avoiding the creation of the
intermediate results, and instead directly creating the
results that find_cc() would return according to the
following diagram.
+--+--+--+--+--+--+
|nw | n | ne|
+ +--+--+--+--+ +
| |se sw|se sw| |
+--+ + + + +--+
| |ne nw|ne nw| |
+ww+--+- c -+--+ee+
| |se sw|se sw| |
+--+ + + + +--+
| |ne nw|ne nw| |
+ +--+--+--+--+ +
|sw | s | se|
+--+--+--+--+--+--+
nw = find(nw->se, nn->sw, ww->ne, cc->nw);
ne = find(nn->se, ne->sw, cc->ne, ee->nw);
sw = find(ww->se, cc->sw, sw->ne, ss->nw);
se = find(cc->se, ee->sw, ss->ne, se->nw);
}
return(find(nw, ne, sw, se));
}
UNWRITTEN SUPPORT CODE
The code above is the essence of hashlife.
The most important thing it omits is the definition of the find() function,
which serves as the square constructor. It has to implement hash-consing,
i.e. it only ever constructs one copy of a square with a given set of arguments,
to ensure maximal sharing. For serious use it must also implement
garbage collection. I don't know if I can leave that job to the
Boehm-Demers-Weiser conservative collector, but if I do then I'll have to
do something cunning with weak pointers in find()'s hash table.
Maybe I'll write that eventually...