fanf | Recent Entries

Nearly 10 years ago I developed a neat function for converting a Gregorian date into a count of days since some epoch. There was nothing particularly new about it; I just tweaked the numbers until the formula was as simple as possible. I've re-used this code a fair number of times since then without re-examining it. This evening I worked out that it can be distilled even more.

The function adds together some expressions that count the number of days indicated by the three components of the date, plus a constant to set the epoch.

The number of normal days before the year y (full four digit year) is just y*365.

The number of leap days up to and including the year y is y/4-y/100+y/400. This is the usual "add a leap day in every fourth year, but not in every hundredth year, but do in every 400th year" rule. I'm using integer division that discards any remainder, i.e. rounds down.

The previous two paragraphs are inconsistent about including or excluding the current year. This is because leap days occur inside a leap year, not at the end, so their contribution to the day count does not depend solely on the year. We can fix this by re-assigning January and February to the previous year using a small administrative fiddle, as follows (where m counts months starting from 1). Then a leap day falls just before the start of years that are multiples of four (etc.) and the expression in the previous paragraph magically does the right thing. The fiddle also requires an adjustment to the epoch constant.

        if (m > 2) m -= 2; else m += 10, y -= 1;

Months are the part that I spent most time fiddling with. Section 1.12 of Calendrical Calculations develops various equations for calendars (such as the Julian calendar) that spread their leap years evenly, like Bresenham's line drawing algorithm spreads out raster steps. The equations also work for month lengths in the Julian and Gregorian calendars, if you treat 30-day months as normal years and 31 day months as leap years, and ignore February. We want to calculate the number of days in a year before the start of a given month, which is calculated by equation 1.73:

        (L*y - L + D*L % C) / C + (y - 1) * N

where L is the number of leap years in a cycle, C is the total number of years in a cycle, D is the offset of the start of the cycle, N is the number of days in a normal year, and years are actually months. Yuck. It turns out that if we perform the right administrative fiddling, we can arrange that D=0, and we can substitute m=y-1. This gives us the greatly simplified:

        m*L/C + m*N = m * (N*C+L) / C

For Julian and Gregorian months, N=30, C=12, and L=7, giving m*367/12. This pretends that February has 30 days, but the pretence doesn't matter because we have moved February to the end of the year so we never count days after February 28th or 29th . The complete function is then as follows (with the epoch set so that 1st January, 1 A.D. is day 1).

    int A(int y, int m, int d) {
        if (m > 2) m -= 2; else m += 10, y -= 1;
        return y*365 + y/4 - y/100 + y/400 + m*367/12 + d - 336;
    }

In fact you don't have to treat a whole year as a cycle for the purpose of the month calculation. The most basic pattern is a five month cycle containing three long months, as in March-July and August-December. (January and February are the start of a third truncated cycle.) However when I tried to make m*153/5 work in 1999 I could not eliminate the D offset to keep the equations simple. My problem was that I only allowed myself to count months starting from zero or one, and I was flailing around without the sage guidance of messrs. Dershowitz and Reingold. (I didn't use their equation 1.73 when I was developing this code, but it's a good example of what I was trying to avoid.)

It turns out that we can play around with the administrative fiddle to adjust D so that the cycle falls in the right place, so long as we compensate by adjusting the epoch constant. For the Julian and Gregorian five-month cycle D=4, i.e. we need to shift March (the start of the cycle) to month number 4, which gives us the following code.

    int B(int y, int m, int d) {
        if (m > 2) m += 1; else m += 13, y -= 1;
        return y*365 + y/4 - y/100 + y/400 + m*153/5 + d - 428;
    }

I mentioned above that equation 1.73 works for Julian leap years, so we can use it for years as well as months. This combines first two terms of the return expression leaving the Gregorian correction separate. No additional fiddling is necessary. We can't merge in the Gregorian correction as well, because evenly spreading the leap years gives intervals of 4 or 5, not 4 or 8.

    int C(int y, int m, int d) {
        if (m > 2) m += 1; else m += 13, y -= 1;
        return y*1461/4 - y/100 + y/400 + m*153/5 + d - 428;
    }

There's a related function for calculating the number of days in a month. It uses the remainders from the divisions to find the length of the current month, instead of using the results to count the preceding days. My old code re-used the same administrative fiddle:

    int D(int y, int m) {
        if (m > 2) m -= 2; else m += 10;
        return m*367%12 > 4 ? 31
             : m != 12     ? 30
             : y % 4      ? 28
             : y % 100   ? 29
             : y % 400  ? 28
                       : 29;
    }

In fact you don't need a fiddle in this case, because there's no need to move February to the end of the year. You can just pick values of C and L which make the pattern fall in the right place.

    int E(int y, int m) {
        return m*275%9 > 3 ? 31
             : m != 2     ? 30
             : y % 4     ? 28
             : y % 100  ? 29
             : y % 400 ? 28
                      : 29;
    }

This week I have been discussing the Game of Life with the amazingly enthusiastic Brice Due, after he commented on my original post about LIAR. He put me in touch with Tom Rokicki who, as well as writing probably the best implementation of Life, recently featured in New Scientist for his work on God's solution to Rubik's Cube.

Tom pointed me at a better bit-twiddling implementation of Life which comes in at 26 operations compared to my code's 34 - a significant improvement. However his code is ultra-compressed and really hard to understand. So I wrote a comprehensible version. This code has some rather clever features.

	left = bmp << 1;
	right = bmp >> 1;
	side1 = left ^ right;
	side2 = left & right;
	mid1 = side1 ^ bmp;
	mid2 = side1 & bmp;
	mid2 = side2 | mid2;

In the first stage he uses a normal full adder to do the horizontal 3-to-2 compression which counts the number of cells in each row (bits mid1 and mid2). The first two ops of the full adder form a half adder which he uses to count the cells to either side of the middle cell (bits side1 and side2). Thus he gets four useful outputs from just five ops!

	upper1 = mid1 << 8;
	lower1 = mid1 >> 8;
	upper2 = mid2 << 8;
	lower2 = mid2 >> 8;

The side bits are kept in the middle row, whereas the mid bits are shifted to become the sums for the upper and lower rows.

#define REDUCE(g,f,a,b,c) do {	\
		uint64_t d, e;	\
		d = a ^ b;	\
		e = b ^ c;	\
		f = c ^ d;	\
		g = d | e;	\
	} while(0)

	REDUCE(sum12, sum13, upper1, side1, lower1);
	REDUCE(sum24, sum26, upper2, side2, lower2);

The second stage is to compress the three rows into separate sums of the low bits and the high bits. Tom does not use a normal adder to do this. It turns out that there are 24 adder-like boolean functions of three arguments, which produce two values that together encode the number of input bits. Only one of them requires four operations - the others require five or more. The encoding of the result is a bit strange:

count	result
0	f=0 g=0
1	f=1 g=1
2	f=0 g=1
3	f=1 g=0

As well as saving two ops compared to using a full adder, this function makes the final twiddling particularly brief.

	tmp = sum12 ^ sum13;
	bmp = (bmp | sum13) & (tmp ^ sum24) & (tmp ^ sum26);
	return(bmp & 0x007E7E7E7E7E7E00);

I wondered if it's possible to use the 4-op adder-alike in the first stage too. The problem with adding together all nine cells is overflowing the three bit result. This is OK when using a full adder because the overflow happens to confuse a very large count (therefore a dead result) with a very small count (again a dead result) so it makes no difference. The modified adder confuses a middling small count (live result) with a middling big count (dead result) so it fails. The dual-purpose half+full-adder trick neatly avoids overflow problems.

Despite successfully unpicking this code I still find it rather unintuitive, especially the final twiddle. I am in awe of people who can invent such cunning tricks.

Thanks for all the interesting comments on my previous post.

The reason I'm investigating this is to work around false positives caused by SpamAssassin's obfuscation rules. These are intended to match deliberate misspellings of commonly spammed goods such as Viagra. The specific instance that caused the bug report was a Reading postcode being treated as an obfuscated Rolex.

Therefore I'm not particularly worried about missing out obscure special cases like GIR 0AA and the overseas territories AAAA 1ZZ. However it might be worth tightening up the outcode regex, based on the list of UK postcode areas, to reduce the chance of matching a bogus postcode.

Also, the Post Office's postcode FAQ mentions that only London uses the ANA and AANA outcode formats. (In fact it's only the E, EC, SW, W, WC areas.) I managed to find a list of postcode districts which includes these outcodes (Wikipedia omits them) and it shows that the third position rule is wrong: it says M does not appear there but there is a poscode district London W1M. Rule Three also allows A and E which are not in fact used.

qr{\b
  ([BGLMNS][1-9][0-9]?
  |[A-PR-UWYZ][A-HK-Y][1-9]?[0-9]
  |([EW]C?|NW?|S[EW])[1-9][0-9A-HJKMNPR-Y]
  )[ ]{0,2}
  ([0-9][ABD-HJLNP-UW-Z]{2})
\b}x

A little contribution for anyone else who searches the web for this in the future.

The UK postcode consists of two parts. The first part is the Outward Postcode, or Outcode. This is separated by a single space from the second part which is the Inward Postcode, or Incode. The Outcode directs mail to the correct local area for delivery. The Incode is used to sort the mail at the local area delivery office.

The Outcode has 6 possible formats (as follows) and the Incode is consistently numeric, alpha, alpha format.

AN NAA
ANN NAA
ANA NAA
AAN NAA
AANN NAA
AANA NAA

There are some restrictions on the letters:

The letters [QVX] are not used in the first position.
The letters [IJZ] are not used in the second position.
The only letters to appear in the third position are [ABCDEFGHJKSTUW].
The only letters to appear in the fourth position are [ABEHMNPRVWXY].
The letters [CIKMOV] are not used in the second part.

This translates into a perl extended regex as follows (with slightly relaxed whitespace):

qr{\b
    ([A-PR-UWYZ]\d[\dA-HJKSTUW]? # rules 1,3
    |[A-PR-UWYZ][A-HK-Y]\d[\dABEHMNPRVWXY]? # rules 1,2,4
    )[\t ]{1,2}
    (\d[ABD-HJLNP-UW-Z]{2}) # rule 5
\b}x

Update: more here.

I arrived in the office today to find that some fans had given me a birthday present, which rather amused me :-)

Pat Stewart is retiring in December. She has been our user accounts manager for many years, as well as being infamous as the Keeper of the Ton of Bricks and Explainer of Rules to wayward undergraduates. We've just advertised her job and we're hoping to find a replacement with enough time for a couple of months overlap. The job description below is actually a fair description of what Pat does - for once it isn't just the usual platitudes. She is a hard act to follow.

User Policy and Account Manager

The University Computing Service has a vacancy for a User Policy and Account Manager. The successful applicant will create, develop and maintain the policies relating to the management of accounts for the use of University Computing Service computing resources, ensuring an orderly and well managed environment for University members to carry out their studies and research.

The successful applicant will have a good degree or equivalent and a sound broad-based background in IT with considerable wide-ranging experience including support for a variety of operating systems. Experience of working in Higher Education and a working knowledge of legislation relating to IT would be expected. The applicant will have experience of managing staff, and must be able to communicate effectively with users at all levels in a variety of situations.

The applicant must have good interpersonal skills and able to act with discretion, empathy and tact. The applicant will need to be self-confident, motivated, well organised, meticulous, able to prioritise tasks when working under pressure, and have the ability to anticipate problems before they arise, and develop solutions to cope with them.

This is a permanent appointment at Grade 9, subject to a nine month satisfactory probation period, on a salary scale of £34,793 to £44,074 with initial salary depending on experience. The closing date for applications is Friday 5 September 2008. It is intended that interviews will be held during the week beginning 15 September 2008. Further details are available on the Computing Service web site.

I've now read Dan Kaminsky's slides, which are mostly ranting "the sky is falling" and pointing out what assumes secure DNS. The actual attack is not described in any more detail than previous public sources. I still don't understand why resolvers accept the poison: Kaminsky seems to be suggesting that data in the additional section of a reply is overwriting cached answers, which RFC 2181 says must not happen. Anyway,

$ md5 <kaminsky; echo; cat kaminsky; echo
ef96f2d9e973a36e825793ddeff48ae5

Kaminsky says his attack allows him to overwrite data in a DNS cache.
The reason Matasano's glue / additional RR explanation is wrong can be
found in RFC 2181 section 5.4.1, which says that additional RRs must
not override authoritative data in the DNS cache. However, if you look
at RFC 1034 section 3.6.2 you'll see that the data for a CNAME target
goes in the answer section, not the additional section. Presumably
this means the resolver treats it as trustworthy and therefore
overwrites its "old" cached data with the CNAME target data. (However
RFC 2181 also says that a resolver shouldn't trust CNAME target data,
which implies this attack shouldn't work, though I can see why it does
if the CNAME target is in the same zone as the CNAME owner.)

So the attack is: (1) cause the victim to look up lots of random
invalid domain names adjacent to the goal name whose data you want to
overwrite; (2) spoof authoritative answers that contain two records: a
CNAME whose owner is your random invalid name and whose target is your
goal name, plus the data for the goal name that you want to insert
into the victim's cache; (3) when you win the spoofing race your data
for the goal name overwrites whatever the victim previously had
cached.

$ md5 <kaminsky2; echo; cat kaminsky2
d4b70e6abfa3e7d49e159d75b5fc277b

So there's another form of attack which is closer to the Matasano
description but still different in significant ways. This relies on
the fact that (again according to RFC 2181 section 5.4.1) data in the
authority section of a reply is given a lot of trust. The attack is:
(1) cause the victim to look up lots of random invalid domain names
adjacent to the goal name whose data you want to overwrite; (2) spoof
authoritative answers that have an authority section containing
replacement NS records for the target's zone which point to
nameservers you control; (3) when you win the spoofing race the victim
will start using your name servers for the target zone instead of the
proper nameservers; (4) your nameserver can reply in whatever way you
want for the target name, but you'll have to wait for it to expire
from the victim's cache. Therefore this attack is slower than the
CNAME attack.

Some things to keep me busy for the next few months.

PPswitch:
- Modify activity graphs for the new scanner setup. Add SpamAssassin load tracking.
- Do a public weekly spam filtering statistics news item. Break down stats per department, especially so that the Old Fools know how much junk their anti-spam appliance isn't seeing.
- Push per-user filtering options out to SMTP time. Add per-domain filtering options for other University mail servers.
Exim:
- Finish the TLS logging improvements and unit tests for the new ratelimit code, then roll a 4.70 release.
- Implement greylisting, probably for 4.71.
- Revamp the hints database layer so it can support concurrent updates (e.g. Berkeley DB row-level locking) and clustered databases (e.g. memcached), probably for 4.72. This will improve the retry, ratelimit, and greylisting databases.
FreeBSD:
- Update factor(6) with bug fixes from NetBSD. (bug found when factorizing my phone number!)
- Fix bugs and feature omissions in unifdef(1) reported by Ben Hutchings and Anders Kaseorg.

I have released a new version of selog. The only significant change is C++ support, requested by filecoreinuse.

One thing I would like to be able to do in C++ is call a selector, so that sel(fmt, ...) works like selog(sel, fmt, ...). (The Lua API supports something like this.) However as far as I can tell this would require variadic templates to implement efficiently (i.e. to inline the selog_on() check) and this is probably reaching too far into voodoo territory to be sensible.

We're being hammered by loads of vicious email trojans, which mutate fast. I've resorted to adding manual blocks in Exim because ClamAV isn't keeping up.

Just now I was very puzzled that freshclam wasn't downloading the latest version of the virus database. It turns out that although I have told it to poll 100 times a day (about every 15 minutes), freshclam uses the DNS to check what is the latest version, and the TTL on the relevant DNS record is 30 minutes.

So there's another form of attack which is closer to the Matasano description but still different in significant ways.

$ md5 <~/doc/kaminsky2
d4b70e6abfa3e7d49e159d75b5fc277b

beezari posted a copy of the leaked Matasano explanation of Kaminsky's new DNS attack. I believe the explanation isn't quite right. In his interview in the WIRED Threat Level blog Kaminsky mentions that the attack relies on CNAMEs. This means that it does not depend on glue nor on additional section processing, which is what Matasano described. I believe the real explanation is...

$ md5 <~/doc/kaminsky
ef96f2d9e973a36e825793ddeff48ae5

senji suggested ratelimiting email based on the MD5 checksum of any attachments, with the goal of slowing down an email virus attack. I think this might be feasible so I'm noting it here as a sort of public to-do list entry...

Today brings us the cheerful news that New Hall has been refounded with an endowment of £30 million and finally been given the proper name that it has been waiting for since 1954.

This led to a discussion on IRC about people who have colleges at both Cambridge and Oxford named after them. The list is:

God

Trinity College, Cambridge
Trinity Hall, Cambridge
~~God's House~~ Christ's College, Cambridge
Trinity College, Oxford

Jesus

Christ's College, Cambridge
Corpus Christi College, Cambridge
Emmanuel College, Cambridge
Jesus College, Cambridge
Christ Church, Oxford
Corpus Christi College, Oxford
Jesus College, Oxford

the Virgin Mary modestly hides in the full names of several colleges

The College of the Blessed Virgin Mary, Saint John the Evangelist, and the glorious virgin Saint Radegund, within the City and University of Cambridge, commonly called Jesus College, in the University of Cambridge
The King's College of Our Lady and Saint Nicholas in Cambridge
The College of Corpus Christi and the Blessed Virgin Mary in Cambridge
~~Hall of the Annunciation of the Blessed Virgin Mary~~ Gonville and Caius College, Cambridge
The College of the Blessed Mary and All Saints, Lincoln (in Oxford)
The Warden and Scholars of St Mary's College of Winchester in Oxford, commonly called New College in Oxford
The Provost and Scholars of the House of the Blessed Mary the Virgin in Oxford, commonly called Oriel College, of the Foundation of Edward the Second of famous memory, sometime King of England

Mary Magdalene

Magdalene College, Cambridge
Magdalen College, Oxford

St Peter

Peterhouse, Cambridge
St Peter's College, Oxford

St Catherine

St Catharine's College, Cambridge
St Catherine's College, Oxford

Edmund Rich of Abingdon

St Edmund's College, Cambridge
St Edmund Hall, Oxford

Lady Margaret Beaufort

Lady Margaret Hall, Oxford
Margaret Beaufort Institute of Theology, Cambridge

Sir Isaac Wolfson

Wolfson College, Cambridge
Wolfson College, Oxford

Cambridge's Pembroke College is named after Marie de St Pol, Countess of Pembroke (widow of the 1st Earl of Pembroke); Oxford's Pembroke College is named after William Herbert, 3rd Earl of Pembroke.

Cambridge has two colleges named after St John the Evangelist, and Oxford has one named after St John the Baptist.

Cambridge has two colleges endowed by Lady Margaret Beaufort (Christ's and John's) whereas Oxford has a college named after her. Edited to add four and a half years later: Cambridge also has the Margaret Beaufort Institute for Theology, which is member of the theological federation so only loosely affiliated with the University and not much like a proper college, but still worth noting.

Cambridge's University College lasted 7 years before it was renamed Wolfson College. Oxford's University College is over 700 years old.

Both universities have benefited from money obtained from manufacturing cars: Cambridge via the Cripps foundation, and Oxford via the Nuffield foundation.

I've just read a paper about "Foundation", which is an updated version of Plan 9's Venti archiving system. Foundation is designed to back up a computer nightly to a USB hard disk and an FTP server. Like Venti, it uses content-addressed storage to eliminate duplicate blocks; the main difference is that Foundation has been optimized to reduce seeks on a single disk, where Venti assumed a fast RAID could hide latency. Foundation uses a Bloom filter to speed up index lookups when eliminating duplicate blocks, and it writes data blocks in an append-only log, so it hits two things that I have been thinking about recently.

Towards the end of the paper are a few wonderful paragraphs that address the problem of reclaiming backup space wasted by bulky temporary data - the paper uses the example of on-disk copies of DVDs to save physical space when travelling. This motivates the desire to support deleting a nightly snapshot. They say:

Conceptually, deleting a snapshot resembles garbage collection in programming languages or log cleaning in LFS. First, the CAS layer writes a new version of the system catalog that no longer points to the snapshot. Then, the system reclaims the space used by blocks that are no longer reachable from any other catalog entry. A more recent snapshot, for example, may still point to some block in the deleted snapshot.

Interestingly, the structure of Foundation's log makes identifying unreferenced blocks particularly efficient: as a natural result of the log being append-only, all pointers within the log point "backwards". Garbage collection can thus proceed in a single, sequential pass through the log using an algorithm developed by Armstrong and Virding for garbage collecting immutable data structures in the Erlang programming language.

The algorithm works as follows. Starting at the most recent log entry and scanning backwards, it maintains a list of "live" blocks initialized from the pointers in the system catalog. Each time it encounters a live block, it deletes that block from its list. If the block is a metadata block that contains pointers to other blocks, it adds these pointers to its list. If the algorithm encounters a block that is not in its list, then there are no live pointers to that block later in the log, and since all pointers point backwards, the algorithm can reclaim the block's space immediately. The system can also use this algorithm incrementally: starting from the end of the log, it can scan backward until "enough" space has been reclaimed, and then stop.

The expensive part of the Erlang algorithm is maintaining the list of live blocks. If references to many blocks occur much later in the log than the blocks themselves, this list could grow too large to fit in memory. We note, however, that a conservative version of the collector could use a Bloom filter to store the list of live blocks. Although false positives in the filter would prevent the algorithm from reclaiming some legitimate garbage, its memory usage would be fixed at the size of the Bloom filter.

This reminds me of a recent post on Phil Wadler's blog where he highlights some comments on the proposal to add functional programming to the ACM curriculum. The paragraphs I quoted above are an excellent example of FP knowledge being useful in all parts of computer science.

I need a bike shop that will sell me something like the following. Preferably somewhere with helpful and knowledgable staff who can help me choose which model and which size. (I am deeply unimpressed by Station Cycles whose staff are so useless they couldn't sell me a bike when I was ready to drop 600 quid on one there and then.)

child seat
step-through frame
panniers
hub gears
hub brakes
hub dynamo
chain case

ETA: Yes I know this descrbes a Dutch bike. Yes I can find out which shops stock them. What I want to know is which shops have staff that are helpful and knowledgable about this kind of bike - i.e. not fen mountain bikes and other BSOs.

I gave a tech talk at Google's London offfices yesterday afternoon, based on my old blog posts about "how not to design an MTA" though with a more positive spin. You can look at the slides and notes now, and I believe there will be video available RSN.

Oh god. I nearly puked when I got a reply from a Notes user just now. Between the top-posted reply and the text from my original message were my message's headers, formatted like this:

( aaah! my eyes! )

A few days ago, Ben Goldacre linked to an article by and an interview with a guy called Jonathan Gottschall who likes to apply the scientific method to literary theory. I was gobsmacked! I thought that no-one who takes literary theory seriously has any intellectual engagement with the real world.

I had a brief discussion about Librarything with

addedentry at his and

j4's birthday party on Saturday, in which he talked about its new approaches to cataloguing. Its use of tagging, as opposed to ontological classification, is its most obvious feature, but its concept of a "work" as an umbrella for the multiple editions of a book is also useful.

In the pub this evening we decided that both of these things are post-postmodern. Traditional cataloguing is very modernist in its approach: top-down, paternalistic, relying on the academic expert to benevolently provide for everyone's needs. Literary criticism is the ultimate expression of postmodernism: observing that experts are not always right and that new theories come from outside of the consensus, they deny the existence of objective truth and assert that all opinion is equally valid. Scientists and engineers instinctively reject postmodernism, but often fail to do so without relying on discredited modernist thinking.

How can we get beyond postmodernism? I think it has to be the acknowledgment that a fuzzy consensus is a valid approximation to the truth, and that we have experimental and statistical tools which can refine that approximation. But the crucial thing is to realize that these tools don't just work for physics or chemistry or biology or medicine, but they also work for cataloguing books, or establishing that the author does have a degree of control over the reader's thoughts, or showing that beauty does have a degree of commonality across cultures, or that we can automatically translate between natural languages.

The preceding ill-informed rant was brought to you by Summer Lightning.

AJ's idea of scanning email for passwords has provoked a lot of strange suggestions, online and in person. Elaborate password cracking clusters, phishing my own users, hardware acceleration, etc... (When AJ suggested the Idea, my first thought was to use a Bloom filter to find passwords, since you can't recover the contents of a Bloom filter without brute force - but it's very hard to delete items from a Bloom filter, which in this scenario must be done whenever any user changes their password. No, that too is not going to work.)

The whole idea is very borderline: is it worth spending significant effort when the phishing success rate is much less than 1% per incident, and the current fashion for phishing universities is probably short-lived? (This week we got about 2000 messages from the phishers and 5 users replied.) On the other hand it would be very interesting to find out what the detection rate would be. Would there be any false positives? i.e. unintentional password appearances? What is the legitimate positive rate? e.g. senior staff sending their passwords to their PAs? (The latter is against our AUP but it is common practice.) How much password sharing is there outside the anticipated scenarios?

It seems that it's worth making the point that it isn't hard for me to get my users' plaintext passwords: I could just instrument our various SASL implementations. But we (me and my colleagues) don't do that because sysadmins are safer not knowing their users' secrets. This is why we don't log message subjects, and why our accidental-deletion recovery tools don't require seeing any message contents. We don't look at the contents of a user's account in any detail without permission from that user - and even then we'd very much prefer not to know that we're recovering email that was deleted by their jilted lover who obtained their password from a shared browser history.

From my point of view, the interesting thing is that it is feasible to detect when a user is sending their own password in a message, using just a standard Unix encrypted password file and some simple code: crypt every word the user sends and compare with that user's crypted password. This is just a few hundred lines of C, including hooks into our authentication database and email content scanner, and choosing the right words. My prototype code can check 2000 MD5 crypted words per second per CPU, and should be able to skip most words in a message since they are outside our password rules.

There has been a lot of traffic on various mailing lists about these phishing attacks, especially notifications of new reply addresses. But we don't want to be in the business of maintaining blacklists of addresses our users mustn't send to. Password scanning seems like a simple way of avoiding that tar pit, which is I think the main attraction. So why do I think it's absurd?

There's been a raft of phishing attacks against Universities over the last few months. We received a couple of thousand of these last night:

Subject:  CONFIRM YOUR EMAIL ADDRESS
Date:     Tue, 6 May 2008 16:08:53 -0400
From:     CAM SUPPORT TEAM 
Reply-To: 

Dear Cam Subscriber,

To complete your (CAM) account, you must reply to this email
immediately and enter your password here (*********)

Failure to do this will immediately render your email address
deactivated from our database.

You can also confirm your email address by logging into your
CAM account at www.webmail.cam.ac.uk/

Thank you for using CAM.AC.UK!
FROM THE CAM SUPPORT TEAM

We did the usual announcement dance, including a notice on the webmail login page, but this did not prevent some users (including webmail users!) from replying to the phish.

captain_aj suggests scanning email to reject it if it contains the user's password. I wonder how long it would take to crypt() every word of every message... :-)

So I needed some Lua functions to extract bit fields from an integral value, specifically struct stat.st_mode. Lua only has floating point numbers, and its standard library doesn't extend beyond ANSI C. So I wrote the following. (Note that a % b == a - floor(a/b)*b.)

   function mask(lobit, hibit, num)
      local toolo = num % 2^lobit
      return (num - toolo) % 2^hibit
   end

   function bit(bit, num)
      return num / 2^bit % 2 >= 1
   end

During a discussion on a work mailing list about TCP performance across the Atlantic, one of my colleagues made some comments that were ridiculous even by his standards. (He's notorious for declaring ex cathedra that things are extremely difficult and no-one understands them any more except a few giants (like himself) from the age when dinosaurs ruled the machine room.) I felt compelled to correct his most egregiously wrong statements, and I'm posting my comments here since they seem to be of wider interest than just University IT staff. In the quoted sections below, when he talks about "products" he means software that games or breaks TCP's fairness and congestion control algorithms.

I went to a very interesting talk by the only TCP/IP experts I have ever met who explained that the only reason the Internet works is that such products were NOT in widespread use.

The importance of congestion control has been common knowledge since the Internet's congestion collapse in the 1980s and Van Jacobson's subsequent TCP algorithm fixes. (It's taught to all CS undergrads, e.g. look at the Digital Communications 1 syllabus in CST part 1B.) However software that games TCP's fairness model *is* in widespread use: practically all P2P software uses multiple TCP connections to get better throughput.

In the mid 1990s the Internet was going to melt down because of all the web browsers making lots of connections that were too short for TCP's congestion control algorithms to be effective. So we got HTTP/1.1 with persistent connections and a limit on the concurrency that browsers use. However these TCP and HTTP measures only work because practically all software on end-hosts co-operates in congestion control: there's nothing to enforce this co-operation.

The rise of P2P means that ISPs are increasingly interested in enforcing limits on bandwidth hogs, hence the arguments about "network neutrality" as the ISPs point fingers at YouTube and the BBC and the file sharers. They are also deploying traffic shapers in the network which manipulate TCP streams with varying degrees of cleverness. (The most sophisticated manipulate the stream of ACKs going back to the sender which causes it to send at the network's desired rate; the stupid ones like Comcast's just kill connections they don't like.)

However, a better approach is to use a more realistic idea of fairness as the basis for congestion control instead of equal bandwidth per TCP connection. e.g. for ISPs, balance per subscriber. Bob Briscoe's recent position paper makes this argument in readable fashion. RFC 2140 groped in the same direction 10 years ago, but was never adopted. Either way, enforcement at the network's edge (instead of trusting the hosts) is likely to be the future whatever model of fairness is used.

Also interesting is the research on adapting TCP to high-bandwidth networks. There are many examples of what Briscoe argues against: great efforts of design and analysis to preserve TCP-style fairness in the new protocols. The GÉANT2 high-speed TCP page has lots of relevant links.

Luckily, there are very few people who know enough about TCP/IP and related protocols to write an effective product because, as I implied, there are probably no more than half-a-dozen TCP/IP experts in the world still working (and may even be none).

Tosh. There's a substantial market out there, both for managing congestion and for exploiting weaknesses in congestion control.

It seems selog tickles a live-code analysis bug in gcc. Here's a fairly minimal test case:

$ cat gccbug.c

struct sel {
        const char  *name;
        unsigned int enabled;
};

int selog_on(struct sel *);

#define selog_on(sel) \
        (sel.enabled == 0 \
                ? selog_on(&sel) \
                : sel.enabled != 42)

void dostuff(int *);

void gccbug1(void) {
        struct sel sel = { "test", 0 };
        if(selog_on(sel)) {
                dostuff(0);
        }
}

void gccbug2(void) {
        struct sel sel = { "test", 0 };
        if(selog_on(sel)) {
                int i;
                dostuff(&i);
        }
}


$ gcc -O -Wunreachable-code -c gccbug.c
gccbug.c: In function `gccbug2':
gccbug.c:25: warning: will never be executed
$ gcc --version
gcc --version
gcc (GCC) 3.4.4 [FreeBSD] 20050518

$ gcc -O -Wunreachable-code -c gccbug.c
gccbug.c: In function ‘gccbug2’:
gccbug.c:24: warning: will never be executed
gccbug.c: In function ‘gccbug1’:
gccbug.c:17: warning: will never be executed
$ gcc -O -Werror -Wunreachable-code -c gccbug.c
cc1: warnings being treated as errors
gccbug.c: In function ‘gccbug2’:
gccbug.c:24: warning: will never be executed
$ gcc --version
gcc (GCC) 4.1.2 20070115 (prerelease) (SUSE Linux)

$ gcc -O -Wunreachable-code -c gccbug.c
gccbug.c: In function 'gccbug2':
gccbug.c:24: warning: will never be executed
gccbug.c: In function 'gccbug1':
gccbug.c:17: warning: will never be executed
$ gcc -O -Werror -Wunreachable-code -c gccbug.c
cc1: warnings being treated as errors
gccbug.c: In function 'gccbug2':
gccbug.c:24: warning: will never be executed
$ gcc --version
gcc (GCC) 4.2.1 20070719  [FreeBSD]

This has to be the strangest (or stupidest) way that I have seen of treating ports < 1024 specially.

It seems that the nameservers for spacely.net drop any queries that use a source port < 1024. (They also serve domains such as ChristopherReeve.org, which is a brain research funding charity and the reason I found out about the problem.) It was a complete bugger to diagnose, because dig of course uses a high source port, so it was able to resolve names without a problem. My nameservers (and chiark's) use the traditional source port 53, so they were in trouble. By default bind now uses high source ports so most sites would not trip over the stupidity.

I guess it's time to remove that bit of ancient paranoia from my usual named.conf options section.

; <<>> DiG 9.3.4 <<>> -b 131.111.8.132#1023 mx ChristopherReeve.org @ns.spacely.net
; (1 server found)
;; global options:  printcmd
;; connection timed out; no servers could be reached

; <<>> DiG 9.3.4 <<>> -b 131.111.8.132#1024 mx ChristopherReeve.org @ns.spacely.net
; (1 server found)
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 5873
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 2

;; QUESTION SECTION:
;ChristopherReeve.org.          IN      MX

;; ANSWER SECTION:
ChristopherReeve.org.   14400   IN      MX      10 mail.ChristopherReeve.org.
ChristopherReeve.org.   14400   IN      MX      5 mail1.ChristopherReeve.org.

;; ADDITIONAL SECTION:
mail.ChristopherReeve.org. 14400 IN     A       66.215.16.66
mail1.ChristopherReeve.org. 14400 IN    A       67.151.7.145

;; Query time: 185 msec
;; SERVER: 64.209.141.253#53(64.209.141.253)
;; WHEN: Thu Apr 17 12:53:08 2008
;; MSG SIZE  rcvd: 113

In my post about Hashlife I said that calc1() "is not memoized, but instead relies on calc2() for efficiency". This is wrong.

[calc1() is the function that calculates an arbitrary number of generations into the future; calc2() is the function that calculates 2^(n-2) generations into the future for a square 2^n cells on a side, and its result is cached in the main quadtree.]

Calculating Life efficiently requires skipping over the easy bits as fast as possible, which in most implementations is blank space, but in Hashlife it ought to include any other kind of previously-computed space too. However calc1() can only skip large amounts of space if it is also skipping large amounts of time - a multiple of a large power of two, to be exact. This means that if you are single-stepping, my Hashlife code would separately examine every 4x4 square in the universe. This is particularly dumb when there is inevitably lots of blank space around the edges.

The solution is to memoize the calculation for arbitrary numbers of generations (or at least for 1 <= g <= n^(n-2) for squares 2^n on a side). If you do that, it no longer makes sense to store the result pointers in the squares themselves: instead, bung them all in another hash table that is indexed by the pair of a square and a generation count. Having done that, it also makes sense to evict the population counts from the squares into a third hash table, since they are rarely needed so just waste memory.

The other Hashlife implementations that I was cribbing off do not have this extra cacheing. I wonder if that is why Hashlife has a poor reputation for efficiency? I'll have to do some more implementation to find out.

While I'm here, I should say that David Bell's equivalent of calc1() is interesting because it only examines the past light cone of the region that is being displayed, which saves working on irrelevant far-away areas - though he spoils it by re-doing the recursion for every pixel to be displayed...

An active PS/2 to USB keyboard adaptor that doesn't work with my old :CueCat, so I can't use the :cc with my Mac.
An old laptop PC with a flaky CDROM drive that refuses to read most OS install media, so I'm stuck with ancient FreeBSD and no wireless support until I can upgrade.
The :cc's stupid habit of typing Alt+F10 before the encoded barcode data, which occasionally gives Mozilla the heebie-jeebies or even accidentally kills it.
LibraryThing in overload mode refusing to let me log in again.

On the other hand, one of my old bug reports reminded me how to get a SHARP PC-AR10 working with FreeBSD-4.X without much trouble, and when LT deigns to let me log in its import features are really easy to use.

I got two variants that slipped past my filters overnight:

Yahoo! groups

tmatt087dten@yahoo.com has invited you to join Build_Instant_Money_Now on Yahoo! Groups, the best way to discover and share information and advice with others.

Google Calendar

Description: Dear Good Friend,
I am very happy to inform you about my success in getting those funds transferred under the cooperation of a new partner in London,Presently,I'm in saudi arabia for the purpose of investing my own share of the money.

An interesting way to deliver spam. It reminds me a bit of a key weakness of DJB's IM2000, where the message availability notifications can be used to carry spam payloads regardless of their intended purpose.

I'm pleased to announce the first release of selog, which you can download from <http://dotat.at/prog/selog/>.

Selog is a library of routines that unifies error reporting, activity logging, and debug tracing. It allows programmers to give their users flexible control over which messages are written and where they are written to.

Selog is designed to be as simple as possible while still being extremely efficient, powerful, and flexible. The essence of selog is:

Obtain a configuration string from the user, which looks like:
+log_pid+example@syslog;-example@stderr
When the program starts, initialize selog with the user's configuration.
selog_open(config, spelling)
Define a selector for each different kind of message.
static selog_selector sel = SELINIT("example", SELOG_INFO);
Use the selog() function to emit messages instead of printf() or syslog().
selog(sel, "message number %d", n);

Selectors determine which messages are written and where they are written to, under the control of the user's configuration. You can direct messages to any combination of stderr, syslog, files, pipes, etc. You can omit or include optional parts of messages under the control of selectors. You don't have to signal the program when you rotate its log files.

The C interface consists of just 13 functions, 5 macros, 2 types, and an enum. There are a few variations of the basic selog() one-shot logging function, or you can quickly and easily compose messages in stages. The check to skip disabled messages is extremely small and fast.

Selog comes with shell command and Lua interfaces, plus interposition libraries which you can use to fool old code that calls err() or syslog() into using selog instead.

Selog's origins

I started work on selog when I found myself writing yet another half-arsed logging/debugging module that could only be used by the program it was written for. The problem needed to be solved properly, but I couldn't find a decent existing solution.

Exim's logging and debugging facilities provided the basis for selog's configuration syntax and the idea for the fast check to disable debugging code. Like many other programs, Exim's logging and debugging code is non-orthogonal and non-reusable; selog exists to avoid problems like Exim's fixed set of log files and the lack of debugging redirection.

The architecture of log4j and its many imitators provided many nice ideas to copy, in particular selog's selectors and channels are similar to log4j's loggers and appenders. However log4j is unnecessarily complicated, so selog discards plenty of its ideas, such as hierarchial category names and arbitrarily configurable message layouts. Bloaty log4c provided a salutory anti-pattern.

I've tried to make this first release fairly polished - e.g. it has comprehensive documentation - however it has not had its portability catches polished off, and in particular I'm going to be interested to see how much trouble my use of C99 features causes. The next step is to go back to the program that I wrote selog for, and put the code to use...

Earlier this week I needed to know how to write a thread-aware library so that it's as easy as possible to link it with both threaded and non-threaded programs.

By "thread-aware" I mean that the library needs to lock its shared data structures when it is linked with a threaded program, as opposed to being thread-safe by partitioning its data so that it is naturally used by only one thread at a time. Thread-aware code needs to call functions like pthread_mutex_lock() but does not itself create threads. When it is linked to a non-threaded program the locking calls should be no-ops.

On IRC we discussed various more or less ugly compiler and linker tricks to redirect pthread calls to no-op stubs in non-threaded programs.

ewx said that the OS ought to come with a stub library, and

cjwatson and

pm215 said that Linux and HP-UX have the stubs built-in to libc. I found that BSD does the same, and that this fact is relied on by such prominent code as X11, so it's sufficiently portable for my purposes.

So in the end it turns out that you don't need to do anything special: just make whatever pthread calls you need. If the program is threaded (i.e. it links to libpthread) the calls will perform the necessary locking. If the program is not threaded there will be no libpthread to override the stub definitions in libc, so the locking calls will do nothing. Sweet.

Around the end of October I read David Prerau's book "Saving the Daylight". It changed my opinion of DST somewhat. I used to think that it was unmitigated stupidity, and that everyone should make seasonal adjustments to their timetables instead, or something. (For example, my secondary school had afternoon lessons just after lunch in the summer, but later after tea in the winter.) What I hadn't considered was the human-factors effects that this anarchy would have. In the USA for much of the 20th century DST was a popular idea but it was not consistently enforced by governments at any level. The result was prolonged chaos, documented at length in Prerau's book. DST only works well when applied consistently over large areas, and enforcing it by law is the best way of achieving it.

I'm not entirely convinced that the goals of DST are foolish. I'm by nature the kind of person that Benjamin Franklin and (with less humour) William Willett wanted to persuade to get up earlier. But I like long summer evenings, and it's hard to make the most of them without awkward winter daylight times if your clock is fixed relative to mean solar time.

However I think mean solar time is a fetish that should now be discarded, especially in the guise of UTC. Leap seconds are an abomination: civil time does not need to be synchronized that closely to the Earth's rotational angle. Ideally we'd just define civil time as a timezone offset from atomic time, without the intermediate layer that keeps the astronomers happy. (The UK abolished the RGO because responsibility for timekeeping had moved from the astronomers to the physicists. The rest of the world hasn't yet made the same break with history.)

Having abolished unpredictable leap second disruptions in our timekeeping, we would still be dependent for stability on politicians not arsing around with the timezones. They have proved themselves incapable of keeping their grubby hands off. Perhaps the solution is to make a further break from the noon-oriented mean solar time fetish and use a more practical benchmark as the basis of civil time: sunrise. If the sun always rises at 07:30, then we get nice long summer evenings, sensible daylight in winter, and a time system based on natural philosophy not political tinkering. Actually, rather than using local sunrise as your benchmark, you want northern, southern, and perhaps tropical sunrise time zones. For example, civil times in the northern hemisphere might be defined by one-hour offsets from local sunrise time at the intersection of the Tropic of Cancer and the Greenwich Meridian. Perfect!

So about five months ago I wrote a brief description of how this could work. But I quickly realised (with Prerau's book in mind) that the practicalities of persuading people to put this into practice made the idea ridiculous. So it became an April 1st RISKS Digest item, which I kept under my hat for all that time. I was greatly amused by

simont's recent comment advocating something very similar, though I couldn't follow up until today :-)

The past light cones of the cells on the unbounded edges of the pattern have a nasty effect. As you go further into the future, even though you only display a small quadrant near the origin, more and more empty space in the past must be examined to work out the state of the cells in the present. This space leak causes the calculation of each generation to get more expensive as the square of the generation number, even though it would ideally be constant.

To fix this, the size of the original universe must be restricted, instead of restricting its size just for display. Divide the display function into two:

  restrict (x,y) = map (take x) . take y
  display = unlines . map (concatMap show)

We also need to change the grow function to deal with the other two edges of the universe.

  bracket a xs = [a] ++ xs ++ [a]
  grow = (bracket deadline) . map (bracket deadcell)

Then in the loop we remove the dimension argument to display and change the main program to:

  main = loop $ restrict (10,10) glider

This is enough to make it possible to compute the 1103 interesting generations of the R pentomino at a few generations per second (text output being rather slow).

The Game of Life is fairly pointless, but in this case it is also written in "point-free" style which minimizes the use of variables. I think it's quite cute.

It's easy to represent an unbounded quarter-plane in Haskell as [[Cell]], i.e. a list of rows of cells - the lists are lazy, so they can make themselves as long as necessary. You can then define the universe without form and void, such that darkness is upon the face of the screen, as:

  data Cell = Dead | Alive
  deadline = repeat Dead
  deadspace = repeat deadline

To create a pattern we add live cells to this substrate. First we need a helper function that changes just the Nth element of a list:

  mapNth n f xs = as ++ f b : cs
    where (as,b:cs) = splitAt n xs

Then to set an individual cell we use mapNth to modify the Xth element of the Yth row. Pointfree style tends to use the $ operator a lot, which makes Perl poetry-style programmers feel at home.

  setcell (x,y) = mapNth y $ mapNth x $ const Alive

We can apply setcell successively starting from dead space to create a pattern. For example, here's a glider set to run off into the unbounded corner of the universe.

  pattern = foldr setcell deadspace
  glider = pattern [(2,1),(3,2),(3,3),(2,3),(1,3)]

When working out the next generation we need to work out the neighbourhood of each cell, i.e. the 3x3 square consisting of the cell and its eight neighbours. We'll end up with the neighbourhood represented as a triple of triples, on which it's straightforward to specify the Game of Life rule:

  fromCell Dead = 0
  fromCell Alive = 1

  rule ((a,b,c),(d,e,f),(g,h,i)) =
      if count == 2 then e else
      if count == 3 then Alive else Dead
    where
      count = sum $ map fromCell [a,b,c,d,f,g,h,i]

The one-dimensional version of the neighbourhood is quite easy, turning a list into a list of overlapping triples.

  by3 (a:b:c:ds) = (a,b,c) : by3 (b:c:ds)
  by3     _      = []

We're going to apply by3 to each row to make horizontal triples, and to the list of rows itself to make triple rows of triples. However what we want is rows of triple triples. Turning a triple of lists into a list of triples is almost what zip3 does, except that zip3 is inconveniently curried.

  uncurry3 f (a,b,c) = f a b c
  zip3' = uncurry3 zip3

Then the code to turn a list of rows of cells into a list of rows of neighbourhoods is just as I outlined in the previous paragraph.

  by3by3 = map zip3' . by3 . map by3

When calculating the next generation, we want to add a strip of empty space to each edge of the universe, so that all our cells have neighbours. If we did not do this the universe would unravel at one cell per generation, a bit like in Greg Egan's book "Schildt's ladder".

  grow = (deadline:) . map (deadcell:)

Now we can actually calculate the next generation! Grow the universe, work out the neighbourhoods, then apply the rule to each of them.

  generate = map (map rule) . by3by3 . grow

To display the universe, we need to convert cells to strings

  instance Show Cell where
    show Dead = "  "
    show Alive = "[]"

To display the whole universe, we need to truncate it to the size of the screen by taking only the necessary elements from the lists, concatenate the string representation of the cells in each row, and join the rows with newlines.

  display (x,y) = unlines . map (concatMap show . take x) . take y

Then the main program simply loops displaying the universe and calculating its next generation.

  loop u = do
    putStrLn $ display (10,10) u
    _ <- getLine
    loop $ generate u

  main :: IO ()
  main = loop glider

I'm baffled. gzip does its thing by calling the zlib deflate() function, though it wraps the compressed stream differently than deflate() normally does. I have an application where I don't need gzip's extra metadata, so I wrote a small program called deflate. They should perform the same apart from a small constant overhead for gzip, but they don't. As far as I can tell the only differences are minor details of buffer handling.

$ jot 40000 | gzip -9c | wc -c
   87733
$ jot 40000 | ./deflate | wc -c
   86224
$ jot 20000 | sed 's/.*/print "hello, world &"/' | ./lua-5.1.2/src/luac -s -o - - | gzip -9c | wc -c
   81470
$ jot 20000 | sed 's/.*/print "hello, world &"/' | ./lua-5.1.2/src/luac -s -o - - | ./deflate | wc -c
   82687

My version of unifdef is in all the BSDs, including Mac OS X (yay!). I was disappointed that it isn't in Tiger, but it is in Leopard, though they still ship the old man page (FAIL). It also lurks in the Linux-2.6 build system, though without the man page, and patched because glibc is too lame to include strlcpy(). Solaris needs to get a grip since it still ships with the old pre-ANSI version by Dave Yost. Based on its man page, AIX is in the same situation. The HP-UX man page has been rewritten but it too looks like the Yost version (which was BSD licenced, not public domain!). Irix is unusual, since it doesn't use the old Berkeley code but has a version written by Brendan Eich (also known as the creator of JavaScript). It looks like I could steal some features from his version.

I've just committed some fixes to unifdef, mostly arising from a bug report made to me by Joe Karthauser when a few Cambridge FreeBSD people met recently for dinner in Teri Aki. He found the bug at work using unifdef on a file that lacked its final newline. It was nice to see Joe again after a gap of several years, and helpful of him to give me an easy way of raising my FreeBSD activity above zero :-) Get the code from http://dotat.at/prog/misc/ or http://www.freebsd.org/cgi/cvsweb.cgi/src/usr.bin/unifdef/

INTRODUCTION

Hashlife is a radically powerful way of computing Conway's Game of Life. It was invented by Bill Gosper who originally described it in his 1984 paper "Exploiting Regularities in Large Cellular Spaces", Physica D (Nonlinear Phenomena) volume 10 pp. 75-80.

Three weeks ago I wrote about Life In A Register which was a tangent from work I was doing on hashlife. I have not made much progress on it since then, so I thought I would post what I have done so far here, because it is the most interesting part. It's a literate-programming description of the algorithm, optimised for clarity of exposition, not speed. This post is (mostly) mechanically generated from the code I have written so far.

Hashlife uses several cunning techniques:

The universe is represented as a quadtree of nested squares that are powers of two cells on a side. There is an intricate and beautiful recursive function for computing life using this data structure.

It avoids constructing multiple copies of equivalent portions of the quadtree that are duplicated at different places or times by looking up previously-constructed squares in the eponymous hash table. (This technique is known as memoizing or hash-consing.) It also remembers the results of Life computations so that it does not need to repeat them.

Because of its aggressive memoization it is extremely effective at computing Life patterns that have some kind of repetition in space or time. Hashlife can take the same length of time to compute from generation N to generation 2N for any large enough N - it has logarithmic complexity. Its disadvantage is that it is not very fast at handling random Life soup.

There are a few hashlife resources on the web. There is David Bell's implementation, and Tomas Rokicki's implementation is part of the "Golly" Game of Life simulator. He also wrote about it for Dr Dobb's: "An Algorithm for Compressing Space and Time". For an example of hashlife's capabilities, it can compute 2^130 generations of the Metacatacryst producing a pattern of 2^232 cells. I first read about hashlife when working on my previous (more conventional) Life implementation.

CORE DATA STRUCTURE AND ALGORITHM

First, a little bit of notation. Because hashlife can handle exceptionally large computations, we represent generation counts, population counts, display co-ordinates, etc. as multiprecision integers. Powers of two are particularly important, so because of the cost of creating mpz_t objects we pre-allocate a few of them. With any luck this upper bound will be overkill.

I'll also use this array for notation in the comments, so pow2[0] == 1, pow2[1] == 2, pow2[8] == 256, etc.

	#include <stdbool.h>
	#include <gmp.h>

	typedef unsigned char byte;

	#define LEVEL_MAX 1024

	static mpz_t pow2[LEVEL_MAX];

	static void
	init_pow2(void) {
		mpz_init_set_ui(pow2[0], 1);
		for(unsigned i = 1; i < LEVEL_MAX; i++) {
			mpz_init(pow2[i]);
			mpz_mul(pow2[i], pow2[i-1], pow2[i-1]);
		}
	}

Hashlife divides the universe up into a quadtree. A square at level N is pow2[N] cells on a side, and divided into four level N-1 quarters with half as many cells on a side, i.e. pow2[N-1]. We do not need to store the level explicitly since we can keep track of it by counting depth of recursion.

The code is written consistently to handle sub-squares in reading order named by compass points, as shown in this declaration. The quarter pointers are never NULL except in level 0 squares. The calc field can be NULL or a pointer to a level N-1 square; it is filled in lazily. We'll have much more to say about it below.

	typedef struct square {
		struct square *nw, *ne,
		              *sw, *se,
		              *calc;
		mpz_t *pop;
	} *square;

There are two level 0 squares, each being one cell that is live or dead. We don't represent this state explicitly, but instead just use the identities of the squares, so they don't need initialization. Read "sq_0" as "square level 0".

  sq_0[0]  sq_0[1]
    +  +     +  +
              ()
    +  +     +  +

	#define NUM_SQ_0 (1 << 1*1)

	static struct square sq_0[NUM_SQ_0];

This function returns 0 for a dead cell and 1 for a live cell. The pointer arithmetic is equivalent to (sq == &sq_0[1]).

	static inline bool
	alive(square sq) {
		return(sq - sq_0);
	}

The level 1 squares have a level 0 square (a cell) in each quarter, so they are 2*2 cells in size and have pow2[2*2] = 16 possible states. The state of each cell is represented by a pointer to sq_0[0] or sq_0[1].

	#define NUM_SQ_1 (1 << 2*2)

	static struct square sq_1[NUM_SQ_1];

	static void
	init_sq_1(void) {
		for(unsigned i = 0; i < NUM_SQ_1; i++) {
			unsigned nw, ne, sw, se;
			nw = (i & 1) >> 0;
			ne = (i & 2) >> 1;
			sw = (i & 4) >> 2;
			se = (i & 8) >> 3;
			square sq = &sq_1[i];
			sq->nw = &sq_0[nw];
			sq->ne = &sq_0[ne];
			sq->sw = &sq_0[sw];
			sq->se = &sq_0[se];
		}
	}

A level 2 square similarly comprises four level 1 squares, so it has 16 cells. Level 2 squares are interesting because they are the first squares that are big enough to calculate the next generation. We can calculate the next state of one of the four inner cells based on its eight surrounding cells as shown below:

   +--+--+--+--+    +--+--+--+--+    +--+--+--+--+    +--+--+--+--+
   |        |  |    |  |        |    |           |    |           |
   +  +--+  +  +    +  +  +--+  +    +--+--+--+  +    +  +--+--+--+
   |  |()|()|  |    |  |()|()|  |    |   () ()|  |    |  |() ()   |
   +  +--+  +  +    +  +  +--+  +    +  +--+  +  +    +  +  +--+  +
   |   () ()|  |    |  |() ()   |    |  |()|()|  |    |  |()|()|  |
   +--+--+--+  +    +  +--+--+--+    +  +--+  +  +    +  +  +--+  +
   |           |    |           |    |        |  |    |  |        |
   +--+--+--+--+    +--+--+--+--+    +--+--+--+--+    +--+--+--+--+

The state of a cell in the next generation is determined by its current state and the number of live neighbours.

	static bool
	life_rule(square nw, square nn, square ne,
	          square ww, square cc, square ee,
	          square sw, square ss, square se) {
		unsigned count;
		count = alive(nw) + alive(nn) + alive(ne)
		      + alive(ww)             + alive(ee)
		      + alive(sw) + alive(ss) + alive(se);
		return(count == 2 ? alive(cc) : count == 3);
	}

However we do not have enough information to calculate the next states of the outer cells because our focus is currently too narrow to see how the surrounding pattern might affect them.

   +--+--+--+--+--+
   |     |     |??|
   +  +  +  +--+  +
   |   ()|()|  |??|
   +  +  +  +--+  +
   |   ()|()   |??|
   +  +  +--+--+--+
   |           |
   +--+--+--+--+

All we can say, given a level 2 square, is that we can compute the state after one generation of a level 1 square aligned to the same centre. We will see in a moment how to use this as the basis for calculating the futures of bigger squares.

We pre-compute the result of the calculation on level 2 squares and store it in the square's calc field for re-use later. In larger squares this field is only filled in when necessary, but in the level 2 case it is always filled in so that it acts as a sentinel to save us from explicitly testing for the base case of the recursion.

For reference when reading the double dereferencing, here's a diagram of how the constituent level 1 sub-squares and level 0 sub-sub-squares within a level 2 square are laid out.

     +--+--+--+--+
     |nw ne|nw ne|
  nw +  +  +  +  + ne
     |sw se|sw se|
     +--+--+--+--+
     |nw ne|nw ne|
  sw +  +  +  +  + se
     |sw se|sw se|
     +--+--+--+--+

	#define NUM_SQ_2 (1 << 4*4)

	static struct square sq_2[NUM_SQ_2];

	static void
	init_sq_2(void) {
		for(unsigned i = 0; i < NUM_SQ_2; i++) {
			unsigned nw, ne, sw, se;
			nw = (i & 0x000F) >> 0;
			ne = (i & 0x00F0) >> 4;
			sw = (i & 0x0F00) >> 8;
			se = (i & 0xF000) >> 12;
			square sq = &sq_2[i];
			sq->nw = &sq_1[nw];
			sq->ne = &sq_1[ne];
			sq->sw = &sq_1[sw];
			sq->se = &sq_1[se];
			// Calculate Life on the four possible 3x3 groups:
			unsigned calc = 0;
			calc |= life_rule(sq->nw->nw, sq->nw->ne, sq->ne->nw,
			                  sq->nw->sw, sq->nw->se, sq->ne->sw,
			                  sq->sw->nw, sq->sw->ne, sq->se->nw) << 0;
			calc |= life_rule(sq->nw->ne, sq->ne->nw, sq->ne->ne,
			                  sq->nw->se, sq->ne->sw, sq->ne->se,
			                  sq->sw->ne, sq->se->nw, sq->se->ne) << 1;
			calc |= life_rule(sq->nw->sw, sq->nw->se, sq->ne->sw,
			                  sq->sw->nw, sq->sw->ne, sq->se->nw,
			                  sq->sw->sw, sq->sw->se, sq->se->sw) << 2;
			calc |= life_rule(sq->nw->se, sq->ne->sw, sq->ne->se,
			                  sq->sw->ne, sq->se->nw, sq->se->ne,
			                  sq->sw->se, sq->se->sw, sq->se->se) << 3;
			sq->calc = &sq_1[calc];
		}
	}

Given the pre-calculated results for level 2 squares, how can use them to calculate the next generation of a level 3 square? We can't just recurse on its four quarters because that leaves gaps between the results:

   +--+--+--+--+--+--+--+--+
   |                       |
   +  +--+--+  +  +--+--+  +
   |  |     |     |     |  |
   +  +  +  +  +  +  +  +  +
   |  |     |     |     |  |
   +  +--+--+  +  +--+--+  +
   |                       |
   +  +  +  +  +  +  +  +  +
   |                       |
   +  +--+--+  +  +--+--+  +
   |  |     |     |     |  |
   +  +  +  +  +  +  +  +  +
   |  |     |     |     |  |
   +  +--+--+  +  +--+--+  +
   |                       |
   +--+--+--+--+--+--+--+--+

To solve this problem the level 2 sub-squares need to overlap so that their smaller level 1 results abut. We achieve this by re- arranging the level 1 sub-sub-squares to make five extra level 2 sub-squares. We then have a total of nine overlapping level 2 squares. (These diagrams are half-size.)

   +--+--+--+--+   +--+--+--+--+   +--+--+--+--+
   |     |     |   |  |     |  |   |     |     |
   +  +  +  +  +   +  +  +  +  +   +  +  +  +  +
   |   nw|     |   |  |  n  |  |   |     |ne   |
   +--+--+  +  +   +  +--+--+  +   +  +  +--+--+
   |           |   |           |   |           |
   +  +  +  +  +   +  +  +  +  +   +  +  +  +  +
   |           |   |           |   |           |
   +--+--+--+--+   +--+--+--+--+   +--+--+--+--+

   +--+--+--+--+   +--+--+--+--+   +--+--+--+--+
   |           |   |           |   |           |
   +--+--+  +  +   +  +--+--+  +   +  +  +--+--+
   |     |     |   |  |     |  |   |     |     |
   +  + w+  +  +   +  +  c  +  +   +  +  +e +  +
   |     |     |   |  |     |  |   |     |     |
   +--+--+  +  +   +  +--+--+  +   +  +  +--+--+
   |           |   |           |   |           |
   +--+--+--+--+   +--+--+--+--+   +--+--+--+--+

   +--+--+--+--+   +--+--+--+--+   +--+--+--+--+
   |           |   |           |   |           |
   +  +  +  +  +   +  +  +  +  +   +  +  +  +  +
   |           |   |           |   |           |
   +--+--+  +  +   +  +--+--+  +   +  +  +--+--+
   |   sw|     |   |  |  s  |  |   |     |se   |
   +  +  +  +  +   +  +  +  +  +   +  +  +  +  +
   |     |     |   |  |     |  |   |     |     |
   +--+--+--+--+   +--+--+--+--+   +--+--+--+--+

We're going to need these nine sub-squares in a couple of places so let us define helper functions for finding them. The main find() function (which I have not written yet) looks up a square in the hash table based on its four constituent sub-squares, or constructs a new square if it was not found. For reference, here's the sub+subsub diagram again.

     +--+--+--+--+
     |nw ne|nw ne|
  nw +  +  +  +  + ne
     |sw se|sw se|
     +--+--+--+--+
     |nw ne|nw ne|
  sw +  +  +  +  + se
     |sw se|sw se|
     +--+--+--+--+

	static square find(square nw, square ne, square sw, square se);

	static inline square find_nw(square sq) {
		return(sq->nw);
	}
	static inline square find_nn(square sq) {
		return(find(sq->nw->ne, sq->ne->nw,
		            sq->nw->se, sq->ne->sw));
	}
	static inline square find_ne(square sq) {
		return(sq->ne);
	}
	static inline square find_ww(square sq) {
		return(find(sq->nw->sw, sq->nw->se,
		            sq->sw->nw, sq->sw->ne));
	}
	static inline square find_cc(square sq) {
		return(find(sq->nw->se, sq->ne->sw,
		            sq->sw->ne, sq->se->nw));
	}
	static inline square find_ee(square sq) {
		return(find(sq->ne->sw, sq->ne->se,
		            sq->se->nw, sq->se->ne));
	}
	static inline square find_sw(square sq) {
		return(sq->sw);
	}
	static inline square find_ss(square sq) {
		return(find(sq->sw->ne, sq->se->nw,
		            sq->sw->se, sq->se->sw));
	}
	static inline square find_se(square sq) {
		return(sq->se);
	}

When we calculate the next generation of each of these nine overlapping level 2 sub-squares, we get nine abutting level 1 squares like this:

   +--+--+--+--+--+--+--+--+
   |                       |
   +  +--+--+--+--+--+--+  +
   |  |     |     |     |  |
   +  + n w + n n + n e +  +
   |  |     |     |     |  |
   +  +--+--+--+--+--+--+  +
   |  |     |     |     |  |
   +  + w w + c c + e e +  +
   |  |     |     |     |  |
   +  +--+--+--+--+--+--+  +
   |  |     |     |     |  |
   +  + s w + s s + s e +  +
   |  |     |     |     |  |
   +  +--+--+--+--+--+--+  +
   |                       |
   +--+--+--+--+--+--+--+--+

We can calculate a second generation by combining these into four overlapping level 2 squares and then working out their results. This gives us four abutting level 1 squares as follows. I have labelled each of these new level 1 squares with the four previous level 1 squares that were used to make their parent level 2 squares. The repeated labels indicate where the overlap occurs.

   +--+--+--+--+--+--+--+--+
   |                       |
   +  +  +  +  +  +  +  +  +
   |                       |
   +  +  +--+--+--+--+  +  +
   |     |nw nn|nn ne|     |
   +  +  +  +  +  +  +  +  +
   |     |ww cc|cc ee|     |
   +  +  +--+--+--+--+  +  +
   |     |ww cc|cc ee|     |
   +  +  +  +  +  +  +  +  +
   |     |sw ss|ss se|     |
   +  +  +--+--+--+--+  +  +
   |                       |
   +  +  +  +  +  +  +  +  +
   |                       |
   +--+--+--+--+--+--+--+--+

Finally we combine these level 1 squares to make another level 2 square. We could use this to calculate a third generation, but we do not because after two generations the work we have done starting from a level 3 square is twice as big in every dimension - both space and time - as the level 2 calculation. This gives us a perfect recursive structure, which is the key to the algorithm.

                level 2  level 3  level N
   input size    4 * 4    8 * 8   pow2[N] * pow2[N]
   output size   2 * 2    4 * 4   pow2[N-1]*pow2[N-1]
   generations     1        2     pow2[N-2]

In Life, the "speed of light", the maximum speed that a signal can propagate, is one cell per generation. This is the rate at which the behaviour of surrounding squares can affect more and more of the cells around the edge of our current square as time passes. Eventually after pow2[N-2] generations our focus is too narrow to say anything about a border pow2[N-2] cells wide, and we only know about the middle square pow2[N-1] cells across. (This is why a square's calc field points to a level N-1 square, just like its quarter fields.) Hence, the ziggurat (truncated pyramid) shape formed by hashlife's recursion is the "past light cone" of the result square, that is, it includes all the cells in past generations that can possibly affect the state of the cells in the result.

The calculation function itself is memoized, i.e. it avoids repeating work by storing its result in the square. This makes the overlapping recursive structure feasible, because without the memoization it would lead to enormous amounts of recomputation. Another way of looking at this is lazy evaluation of the calc field, i.e. call-by-need, where we explicitly force evaluation of the calc field by calling calc2(). The combination of maximal sharing of equivalent squares (enabled by find()'s hash table), memoization of results, and divide-and-conquer recursion is what gives hashlife its amazingly fast logarithmic complexity for patterns that have repeating behaviour.

The memoization is where the pre-computed level 2 calc results act as sentinels, since the test for a memoized result always succeeds at level 2 and terminates the recursion.

	static square
	calc2(square sq) {
		// Check for a memo or a sentinel.
		if(sq->calc)
			return(sq->calc);
		// Find level N-1 sub-squares and calculate level N-2 results.
		square
		    nw = find_nw(sq), nn = find_nn(sq), ne = find_ne(sq),
		    ww = find_ww(sq), cc = find_cc(sq), ee = find_ee(sq),
		    sw = find_sw(sq), ss = find_ss(sq), se = find_se(sq);
		nw = calc2(nw); nn = calc2(nn); ne = calc2(ne);
		ww = calc2(ww); cc = calc2(cc); ee = calc2(ee);
		sw = calc2(sw); ss = calc2(ss); se = calc2(se);
		// Construct intermediate overlapping level N-1 squares
		// and calculate their level N-2 results.
		nw = calc2(find(nw, nn, ww, cc));
		ne = calc2(find(nn, ne, cc, ee));
		sw = calc2(find(ww, cc, sw, ss));
		se = calc2(find(cc, ee, ss, se));
		// Memoize the result.
		sq->calc = find(nw, ne, sw, se);
		return(sq->calc);
	}

What if we want to calculate a number of generations that isn't a power of two? We keep exactly the same spatial geometry (relative sizes and positions of input and output squares) as for the full calculation, so that the recursion also has the same shape. The simplest case of calculating fewer generations is zero, in which case we just construct a sub-square centred on the original square. In fact we already have code to do this in find_cc().

In the more general case the desired generation count is between 0 as worked out by find_cc(), and pow2[N-2] as worked out by calc2(). There are three kinds of recursion depending on whether the target falls before, on, or after the midway generation pow2[N-3]. This point in time corresponds to the age of the nine intermediate results in calc2(), which is also a dividing point in its code and in calc1() below.

                    recursion in first part or second part
           0 < gen < pow2[N-3]      calc1        find_cc
               gen = pow2[N-3]      calc2        find_cc
   pow2[N-3] < gen < pow2[N-2]      calc2         calc1

In calc1() we test the type of recursion for each part separately, and the three cases arise from the way the two tests overlap when cmp == 0. In the second part we have to adjust the generation count so that it is relative to the midway point, and we must restore its value before returning so that it appears unchanged to the caller. (We manipulate its value in-place because that is more efficient than allocating and freeing a local mpz_t with the required value.)

The following code assumes that 0 < gen < pow2[N-2]. Again the base case of the recursion is not explicit. The level argument must not be less than 3, in which case the assumption implies that gen must be 1 which is equal to the halfway point. Therefore no more recursion via calc1() occurs, and the calls to calc2() return immediately because of the level 2 sentinels.

This function is not memoized, but instead relies on calc2() for efficiency.

	static square
	calc1(square sq, unsigned level, mpz_t gen) {
		// Number of generations to the midway point.
		mpz_t *half = &pow2[level - 3];
		int cmp = mpz_cmp(gen, *half);
		level -= 1;
		square
		    nw = find_nw(sq), nn = find_nn(sq), ne = find_ne(sq),
		    ww = find_ww(sq), cc = find_cc(sq), ee = find_ee(sq),
		    sw = find_sw(sq), ss = find_ss(sq), se = find_se(sq);
		if(cmp < 0) {
			nw = calc1(nw, level, gen);
			nn = calc1(nn, level, gen);
			ne = calc1(ne, level, gen);
			ww = calc1(ww, level, gen);
			cc = calc1(cc, level, gen);
			ee = calc1(ee, level, gen);
			sw = calc1(sw, level, gen);
			ss = calc1(ss, level, gen);
			se = calc1(se, level, gen);
		} else {
			nw = calc2(nw); nn = calc2(nn); ne = calc2(ne);
			ww = calc2(ww); cc = calc2(cc); ee = calc2(ee);
			sw = calc2(sw); ss = calc2(ss); se = calc2(se);
		}
		if(cmp > 0) {
			mpz_sub(gen, gen, *half);
			nw = calc1(find(nw, nn, ww, cc), level, gen);
			ne = calc1(find(nn, ne, cc, ee), level, gen);
			sw = calc1(find(ww, cc, sw, ss), level, gen);
			se = calc1(find(cc, ee, ss, se), level, gen);
			mpz_add(gen, gen, *half);
		} else {

In the following code, if we followed the pattern and called find_cc() where the code above calls calc1(), it would require up to eight allocations. We can reduce this maximum to just four by avoiding the creation of the intermediate results, and instead directly creating the results that find_cc() would return according to the following diagram.

   +--+--+--+--+--+--+
   |nw   |  n  |   ne|
   +  +--+--+--+--+  +
   |  |se sw|se sw|  |
   +--+  +  +  +  +--+
   |  |ne nw|ne nw|  |
   +ww+--+- c -+--+ee+
   |  |se sw|se sw|  |
   +--+  +  +  +  +--+
   |  |ne nw|ne nw|  |
   +  +--+--+--+--+  +
   |sw   |  s  |   se|
   +--+--+--+--+--+--+

			nw = find(nw->se, nn->sw, ww->ne, cc->nw);
			ne = find(nn->se, ne->sw, cc->ne, ee->nw);
			sw = find(ww->se, cc->sw, sw->ne, ss->nw);
			se = find(cc->se, ee->sw, ss->ne, se->nw);
		}
		return(find(nw, ne, sw, se));
	}

UNWRITTEN SUPPORT CODE

The code above is the essence of hashlife. The most important thing it omits is the definition of the find() function, which serves as the square constructor. It has to implement hash-consing, i.e. it only ever constructs one copy of a square with a given set of arguments, to ensure maximal sharing. For serious use it must also implement garbage collection. I don't know if I can leave that job to the Boehm-Demers-Weiser conservative collector, but if I do then I'll have to do something cunning with weak pointers in find()'s hash table.

Maybe I'll write that eventually...

It's nice to see Microsoft's own incompetence coming back to bite them. Here are a couple of slightly edited posts to the exim-users list, from an innocent victim and from me.

From: mwi78 -at- gmx.de
Date: 2008-02-12 09:06 -000
To: exim-users -at- exim.org
Subject: Exim and POP3 Connector at Win2003 SBS Server

Hello,

my SBS server connects via pop3connector to my exim4 MTA and login via postmaster-account to any pop3 mailboxes. In server settings I setup to connect only to one mailbox to prevent parallel logins with the same login-data.

I have the problem, that some of these mail accounts fail to login.

Error log at SBS Server

------------------------------
Ereignistyp:        Fehler
Ereignisquelle:     POP3 Connector
Ereigniskategorie:  Download
Ereigniskennung:    1036
Datum:              12.02.2008
Zeit:               07:15:13
Benutzer:           Nicht zutreffend
Computer:           WINSBS2K3
Beschreibung:
Während einer POP3-Transaktion auf Server <mail.piw-windfuhr.de
[User@???]> ist ein Fehler aufgetreten. Der Fehler ist 1232 (Das
Netzlaufwerk ist nicht erreichbar. Weitere Informationen über die
Behebung von Netzwerkproblemen finden Sie in der Windows-Hilfe.).

Weitere Informationen über die Hilfe- und Supportdienste erhalten
Sie unter http://go.microsoft.com/fwlink/events.asp.
------------------------------

The failed logins from the SBS Server are not logged in the syslog logfile.

Have anyone an idea?

Thanks in advance

Martin

From: Tony Finch <dot@dotat.at>
Date: 2008-02-12 11:55 -000
To: exim-users -at- exim.org
Subject: Re: Exim and POP3 Connector at Win2003 SBS Server

This is an Exchange problem, not an Exim problem, but the explanation is too amusing not to post. Firstly, the error message and its pseudo explanation "the network drive is not available" is complete crap. The actual explanation can be found here: http://support.microsoft.com/kb/280331

This behavior occurs because one or more e-mail messages in the mailbox of the POP3 server may contain Null characters and the POP3 Connector for Small Business Server 4.5 is not able to download e-mail messages if they contain Null characters.

We run Cyrus which also chokes on messages containing null characters. We can't just reject these messages because legitimate but malformed messages sometimes contain null characters. The main source of these messages is of course Microsoft MTAs. This fills me with a mixture of glee and loathing.

(It's also amusing that my emotion is schadenfreude, and I'm replying to a German and I have an Austrian email address, but I don't speak German...)

This is in response to mathew's recent post. I'm posting it here (and emailing it) because I can't get his frustrating Wordpress installation to log me in.

He asks, "It’s possible to write some C code to work out whether a machine’s architecture is little-endian or big-endian with respect to bytes. Is it possible, using only ANSI C, to work out whether the machine’s architecture is big-endian or little-endian with respect to bits?"

Machines do not (in general) have any bitwise endianness that's visible to programmers. In order to access bits you have to use shift and mask operations in registers, and these operations do not imply an absolute numbering of bits in the same way that the machine's addressing architecture does for bytes. (Instruction architecture documentation will often number the bits in a word, but a program running on the machine can't tell whether this documentation uses big-endian or little-endian conventions, or whether it numbers the bits from zero or one.)

Machines do have bitwise endiannes at a lower level that is invisible to the programmer, when buses or IO serialise and deserialise bytes, e.g. to and from RAM or disk or network. The serialized form is not accessible, so its endianness doesn't matter. There is an exception to this, which is output devices where the serialised form is visible to the user, e.g. displays with sub-byte pixels. But this is the device's endianness, not the whole machine's, and different devices attached to the same machine can legitimately disagree.

The same arguments about machines' instruction architectures apply to most of the C programming model, i.e. bits are mostly only accessible by shifts and masks, so endianness isn't visible. The exception is bit fields in structs, which allow words to be broken down into units of arbitrary size. Bit fields must have an endianness convention for how they are packed into words, but this is chosen by the compiler and need not agree with the hardware's byte-wise endianness. However the compiler usually does agree with the hardware, so that the following two structures have the same layout, though this is not required by the standard:

        struct a {
                unsigned char first;
                unsigned char second;
        };
        struct b {
                unsigned first : 8;
                unsigned second : 8;
        };

All that stuff about Bloom filters was because I wanted to make sure I understood them before using them to improve Exim's ratelimit feature.

One of the things that our users often do which triggers the rate limiter is forward all their email offsite. This is a huge rate of messages, but they are all from and to the same person - not many different addresses. Secondly, the vast majority of the computers on our network are used by one person, so should only originate email from a limited number of email addresses. Unexpectedly using lots of addresses may be a sign of a compromised machine.

What I wanted was a way of identifying how often different events occurred, with reasonable efficiency. The existing ratelimit code stores (for each sender) a 64 bit timestamp and a 64 bit floating point rate measurement, plus a key to identify the sender. This is small and easy to update. I wanted something that didn't bloat it too much and which is similarly easy to update.

The essential idea is that the new per_addr option also stores a Bloom filter per sender. Each recipient specified by the sender is checked for membership of its Bloom filter; if it is present then no update happens; if it is absent, the sender's rate is incremented and the recipient is added to the filter.

The rate limit configuration includes a time period which is currently used to determine the exponential smoothing constant, i.e. how bursty senders can be. With per_addr it is also used to determine the Bloom filter's lifetime: when it gets too old, the filter is thrown away. This solves the problem of the filter getting over-full and producing loads of false positives; this means that Exim is measuring the number of different recipient addresses used by the sender in each period. It's very pleasing that this is so simple for both the code and the user :-)

The remaining question is how big the filter should be, and how many hash functions to use. For most configurations, the maximum number of entries in the filter will be the same as the maximum rate, so I decided to simply allocate 2 bytes i.e. 16 bits times the limit, since the false positive rate was too high with just one byte per element. I chose to use 8 hash functions, since that provides some headroom above the maximum rate before the false positive rate gets too high. (You can tell Exim to measure the client's success rate, which is always less than the limit, or the client's attempt rate, which can be very much higher.)

With this choice of parameters, the false positive rates for clients sending to different addresses at rates relative to the limit are:

sending rate/limit	false positive rate
0.5	0.0006% or 1 in 170000
1	0.06% or 1 in 1700
2	2.5% or 1 in 40
2.7 = e	9.3% or 1 in 10.7
4	31% or 1 in 3.2

It looks like the false positive rate for fast senders might easily become embarrassing, but again the periodic filter clearance comes to the rescue. When a fast sender stops, its smoothed rate decays exponentially by 1/e each period. Therefore if its rate was more than e times the limit, then the over-full Bloom filter will be cleared before its rate falls below the limit. Thus the smoothing period is the maximum time for which Exim can be fooled into under-estimating the sender's rate, but when this is a serious problem (false positive rate over 9%) the client will be subject to countermeasures continuously. The only bad effect should be incorrect rate measurements in the logs.

The code is actually more general than I have described: you aren't limited to measuring unique recipient addresses. Under the covers the per_addr option is equivalent to per_rcpt/unique=$local_part@$domain so by changing the unique value you can, for example, count unique sender addresses: per_mail/unique=$sender_address.

I have posted a patch to the exim-users list so that people can try out the new code before it gets baked into a release.

These job ads seem to arrive like buses. Anyway, as has been widely reported, we're replacing our old PABX with VoIP, and we're looking for a "Communications Systems Manager". I guess this is to fill the gaping technical void in the VoIP project, so it would be a sysadmin rather than a management role. We also need someone who can clearly communicate technical requirements and options to IT staff around the University who are responsible for the edge networks to which the phones and computers connect. I expect the post will be listed on the usual jobs page RSN.

We still have a vacancy in our web development group, and it has just been re-advertized. See http://www.cam.ac.uk/cs/jobs/ for details...

While writing up my notes on Bloom filters, I wanted a better idea of the relationship between the false positive rate and the size multiplier.

sm = size / pop

pz = exp(−nh ∗ pop / size)

= exp(−nh / sm)

fpr = (1 − pz) ^nh

Given a fixed number of hash functions, what size multiplier do we need to make enough space in the Bloom filter to produce the desired false positive rate? How does the size multiplier change as nh and fpr change? I wrote a program to solve this numerically, and the output is below. The entries for higher fprs show quite nicely that there's an optimal nh for a given fpr.

( big table )

Bloom filters are a simple and clever data structure that every programmer should know about. A Bloom filter represents a set where the universe of possible members is very large and where it doesn't matter if the filter occasionally falsely claims that a non-member is in the set. It is useful for filtering lookups into more expensive (larger) data structures, so that most of the time you can avoid performing lookups that will fail. For example, a hyphenation program on a small computer might have some rules that cover most words, and a dictionary of exceptions; it can represent the entries in the exceptions dictionary as a Bloom filter that fits in RAM, so that it doesn't have to go to disk to find out that its standard rules apply.

The way it works is to represent the set as an array of size bits, which is typically a small multiple of the expected population of the set. To add an item to the set, you hash it with nh independent hash functions, and set the nh bits in the array indexed by the hashes. To check if an item is in the set, you hash it in the same way, and check that all nh bits are set. You get a false positive when hash collisions cause all the bits to be set by accident.

There are a few neat tricks you can perform with Bloom filters. You can take the union of two sets by ORing the two Bloom filters together. You can halve the size of a bloom filter by ORing the top half with the bottom half. You can't delete elements from a Bloom filter, though if false negatives are OK as well as false positives, you can represent the set of deletions as a second Bloom filter. Alternatively, you can use a counting Bloom filter, in which each bit is replaced with a small counter that is incremented for insertions and decremented for deletions. Four bit counters overflow with a probability of 1.37 ∗ 10^-15.

The following analysis is mostly based on Andrei Broder and Michael Mitzenmacher's survey of network applications of Bloom filters.

After pop elements have been added to the set, the probability that a particular bit is zero is

pz′ = (1 − 1/size) ^{nh ∗ pop}

which is approximately

pz = exp(−nh ∗ pop / size)

The probability of a false positive is

fpr = (1 − pz) ^nh

= exp(nh ∗ ln( 1 − pz))

Given a fixed array size and population, we can work out the number of hash functions that minimizes the false positive rate. Increasing nh makes the exponent larger which reduces fpr, but beyond a certain point the smaller pz starts making fpr larger again. First note that

ln(pz) = −nh ∗ pop / size

nh = −ln(pz) ∗ size / pop

which gives us

ln(fpr) = nh ∗ ln(1 − pz)

= −(size / pop) ∗ ln(pz) ∗ ln(1 − pz)

Symmetry reveals that the minimum occurs when pz = 1/2, i.e. half the bits are set, so when the array is full (i.e. at maximum information density),

nh = ln(2) ∗ size / pop

size = pop ∗ nh / ln(2)

= pop ∗ nh ∗ log₂(e)

fpr = exp(−nh ∗ ln(2))

= 2^−nh

The size is a factor of only log₂(e) ≈ 1.44 larger than the information-theoretic lower bound (which I'm not going to explain here).

Let's compare a Bloom filter with a simple one-bit-per-element hashed set, with a target false positive rate of 2^-k. If you set only one bit per element, then (assuming no hash collisions) fpr = pop / size, so size = pop ∗ 2^k, which is exponentially worse than k-hash Bloom filter.

It seems reasonable (for the application I have in mind) to set size = 16 ∗ limit and nh = 8, which gives us headroom for the population to overflow limit by 2 ∗ ln 2 ≈ 1.39 before fpr rises to 1/256.

Adam Kirsch and Michael Mitzenmacher show that, rather than requiring nh independent hash functions, we can use a set of functions of the form h₁(x) + i ∗ h₂(x) where 0 ≤ i < nh, without increasing the false positive rate. A hash function might be implemented by taking log₂(size) bits from the result of MD5(x), so without this trick, nh = 8 limits size to 2¹⁶, whereas with the trick we only need two functions so the limit is large enough that we don't need to worry.

Try learning about SpamAssassin's notfirsthop DNS blacklist option, and why it might be a sensible idea, you utter cretins.

 This message was created automatically by mail delivery software.

 A message that you sent could not be delivered to one or more of its
 recipients. This is a permanent error. The following address(es) failed:

 zzzzzzzz@pts.edu
   SMTP error from remote mail server after end of data:
   host smtp.pts.edu [72.22.0.101]:
   554 Service unavailable; Client host [ppsw-5.csi.cam.ac.uk]
       blocked using Barracuda Reputation;
       http://bbl.barracudacentral.com/q.cgi?ip=86.165.170.59

 ------ This is a copy of the message, including all the headers. ------

 Return-path: <zzz99@cam.ac.uk>
 X-Cam-SpamDetails: Not scanned
 X-Cam-AntiVirus: No virus found
 X-Cam-ScannerInfo: http://www.cam.ac.uk/cs/email/scanner/
 Received: from [86.165.170.59] (port=51166 helo=[192.168.1.2])
           by ppsw-5.csi.cam.ac.uk (smtp.hermes.cam.ac.uk [131.111.8.155]:587)
           with esmtpsa (PLAIN:zzz99) (TLSv1:AES128-SHA:128)
           id 1JHf30-0003CD-Il (Exim 4.67)
           (return-path <zzz99@cam.ac.uk>); Wed, 23 Jan 2008 12:49:42 +0000
 Mime-Version: 1.0 (Apple Message framework v752.3)
 Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
 ...

There's a category of bit-twiddling hacks called "SWAR": SIMD within a register. The essence is to treat a register not as a single number but as a packed array of bit fields. The x86 vector processing extensions (MMX / 3Dnow / SSE) have a SWARish flavour, but you can use the idea with just basic bitwise operations. The ultimate reference for this is the book Hacker's Delight, and there are several useful sites on the web (grep for "hack" in my URL log).

A few years ago I wrote a fairly neat implementation of Conway's Game of Life that did a lot of SWAR. (I described the evolution of the code soon after I finished it.) The key SWAR-related idea is to use four 64-bit registers to represent 64 four-bit numbers: one register has bit 0 times 64, one has bit 1 times 64, one has bit 2 times 64, etc. Then you can use bitwise operations to implement parallel addition in exactly the way that is taught in basic digital electronics. The following macros take 2 or 3 bit vectors of equal value and produce sum and carry result vectors.

The half adder:

	#define ADD2(s0,s1,a0,a1) do {		\
			s1 = (a0) & (a1);	\
			s0 = (a0) ^ (a1);	\
		} while(0)

The full adder:

	#define ADD3(s0,s1,a0,a1,a2) do {	\
			uint64_t c0, c1;	\
			ADD2(s0,c0,a0,a1);	\
			ADD2(s0,c1,s0,a2);	\
			s1 = c0 | c1;		\
		} while(0)

More recently I have been thinking about Bill Gosper's Hashlife algorithm. I'm not going to describe its mind-bending brilliance here, because I want to concentrate on the SWAR tangent that spun off from it. Hashlife's core algorithm is strategically very powerful, but it is not very efficient at low-level tactical bit banging. For example, Golly uses table lookup for calculating small parts of the universe instead of relying on Hashlife all the way down.

My previous SWAR Life code worked on 66x3 cell bitmaps (i.e. a 64 bit register plus its surrounding cells) which arose from the way it raster-scanned its way across the universe. However Hashlife works on square subdivisions of the universe that are a power of two cells on a side. An 8x8 square fits perfectly into a 64 bit register, which is ideal for SWAR. The real beauty (and what makes this Life in a register and not around a register) is that Hashlife deals with the splicing of adjoining squares at its strategic level, so we don't have to look beyond the register file to produce a useful result.

LIAR represents each row of the 8x8 square as an octet in the register. We don't have to worry if the big byte is the top row or the bottom row, or if the big bit of a byte is at the left or right end of the row, because that's handled at the strategic level. By the time the square gets to us symmetry has eliminated worries of endianness. I'll use the big endian convention in the code since it produces more consistent nomenclature.

In order to SWARishly add the eight 1-bit numbers corresponding to a cell's neighbours, they must be shifted to equivalent places in their bit vectors. (In fact we also add the cell itself into the count, since this gives us a neat short-cut.) Start off by getting the left and right cells into the proper place. We need to mask so that rows don't bleed into each other.

	uint64_t liar(uint64_t bmp) {
		uint64_t left  = bmp << 1 & 0xFEFEFEFEFEFEFEFE;
		uint64_t right = bmp >> 1 & 0x7F7F7F7F7F7F7F7F;

Now we can add each cell to its left and right neighbours. (The numeric variable suffix is the value of the bits in the vector.) This does a 3-to-2 horizontal compression of the whole bitmap.

		uint64_t mid1, mid2;
		ADD3(mid1, mid2, left, bmp, right);

By shifting again we can line up the counts of the upper and lower neighbours with each cell. This is where it is helpful to count the cell as well as its neighbours, so all rows are treated the same.

		uint64_t up1   = mid1 << 8;
		uint64_t up2   = mid2 << 8;
		uint64_t down1 = mid1 >> 8;
		uint64_t down2 = mid2 >> 8;

Now we can add these together to compress vertically. We're lazy about propagating the carries in the value-2 bits since it doesn't buy us anything. We end up with bits valued 1, 2, 2, and 4, which makes a maximum total value of 9.

		uint64_t sum1, sum2a, sum2b, sum4;
		ADD3(sum1, sum2a, up1, mid1, down1);
		ADD3(sum2b, sum4, up2, mid2, down2);

The usual way of expressing the Life rule is: if the cell is dead and it has 3 neighbours, it is born, otherwise it stays dead; if a cell is live and it has 2 or 3 neighbours, it survives, otherwise it dies. We need to adjust this because we count the cell too, so if a cell is live and its count is 3 or 4, it survives, otherwise it dies. Notice that when the count is 4 the cell's state is the same in the previous and next generations. Translated into boolean arithmetic, this logic becomes:

		return(bmp & ~sum1 & (~sum2a & ~sum2b  &  sum4 |
				       sum2a &  sum2b  & ~sum4)
			    | sum1 & ( sum2a ^  sum2b) & ~sum4);
	}

This code uses just 39 ALU operations, most of which can run concurrently on a superscalar chip. It needs (I think) a maximum of 8 live values (which is slightly obscured by the coding style) so it shouldn't even need to hit the stack. By contrast, Golly's equivalent code (leafres() near the start of its lifealgo.cpp file) uses 52 ALU ops and 9 random reads of a 64KB table (plus overhead owing to a sub-optimal 8x8 bitmap layout). Hashlife likes to calculate two generations of an 8x8 square at a time, and the second one is easier because you don't need to worry about the edges of the square, so it takes Golly 30 ALU ops and 4 random reads. The two generation total is more than 82 ALU ops and 13 reads for Golly, and just 78 ALU ops for LIAR.

A success, I think :-) With a bit more cunning it might be possible to squeeze LIAR even more...

(After thinking some more...) Actually, propagating carries does simplify it. We don't need to propagate them all the way, because we can treat counts of 8 and 9 identically to 0 and 1. This means we can replace the return statement above with the following. This saves 4 ALU ops, reducing the cost from 39 to 35. I've also worked out that the maximum number of registers needed is six, not eight, and it fits into an in-order triple-issue instruction pipeline so it can run in 15 clock cycles.

		uint64_t sum2, sum4b;
		ADD2(sum2, sum4b, sum2a, sum2b);
		sum4 = sum4 ^ sum4b;
		return(bmp & ~sum1 & ~sum2 &  sum4 |
			      sum1 &  sum2 & ~sum4);

The final code is available on my web site

The following C type definition can be used for declaring local and global structure objects. You can initialize them as if they were bare structures, because C doesn't mind if you omit curly brackets in initializers (though gcc -Wall will complain). You can also use the typedef to declare function arguments, in which case the function will expect a pointer to the structure instead of a copy of it. Furthermore, when you use a variable declared with this typedef, it will be quietly converted into a pointer to the structure just as is expected by the function. This avoids a load of & operators and gives you a sort of poor-man's C++ pass-by-reference.

        typedef struct mytype {
                /* member declarations */
        } mytype[1];

        mytype var;

        int func(mytype arg);

        func(var);

ETA: it seems this trick is used by GMP (see the last paragraph of that page)

[Poll #1092168]

I was catching up with Dave Farber's Interesting People list this evening when I read a brief post about Ray Kurzweil's graph of a 100-year generalization of Moore's law. This caught my interest, so I googled for "Kurzweil" to see if I could find what it was referring to. I found the graph as an illustration of the Wikipedia article about "accelerating change" which in turn led me to Kurzweil's long essay on the subject. He says that Moore's Law is just a particularly quantitative instance of a general rule of exponential progress, encompassing not just technology in general, but also biological evolution, and leading to an AI singularity that he expects before 2050. The Kurzweil AI site has lots of interesting articles on the subject, one of which is a log of an on-line chat between Kurzweil and Vernor Vinge, in which VV plugs a collection of his short stories that was published a few years ago, of which I did not remember having a copy. In fact, it has been languishing on my Amazon wish list because it isn't available new in the UK. However, it is available in the US, and the exchange rate is currently very favourable. So I ordered the book along with a few others, including a copy of the Oxford Companion to the Year, which at $80 is two thirds of its price over here, and a history of the atomic clock which is similarly cut-price. They should go well on my shelves next to The Measurement of Time and Calendrical Calculations (aargh, now there's a third edition!). I was lucky enough to be able to buy the latter two books from the Cambridge University Press bookshop which gives a 20% discount to University Card holders.

Network Working Group                                         C. Hutzler
Request for Comments: 5068
BCP: 134                                                      D. Crocker
Category: Best Current Practice              Brandenburg InternetWorking
                                                              P. Resnick
                                                   QUALCOMM Incorporated
                                                               E. Allman
                                                          Sendmail, Inc.
                                                                T. Finch
                               University of Cambridge Computing Service
                                                           November 2007

  Email Submission Operations: Access and Accountability Requirements

Status of This Memo

   This document specifies an Internet Best Current Practices for the
   Internet Community, and requests discussion and suggestions for
   improvements.  Distribution of this memo is unlimited.

Abstract

   Email has become a popular distribution service for a variety of
   socially unacceptable, mass-effect purposes.  The most obvious ones
   include spam and worms.  This note recommends conventions for the
   operation of email submission and transport services between
   independent operators, such as enterprises and Internet Service
   Providers.  Its goal is to improve lines of accountability for
   controlling abusive uses of the Internet mail service.  To this end,
   this document offers recommendations for constructive operational
   policies between independent operators of email submission and
   transmission services.

   Email authentication technologies are aimed at providing assurances
   and traceability between internetworked networks.  In many email
   services, the weakest link in the chain of assurances is initial
   submission of a message.  This document offers recommendations for
   constructive operational policies for this first step of email
   sending, the submission (or posting) of email into the transmission
   network.  Relaying and delivery entail policies that occur subsequent
   to submission and are outside the scope of this document.

The previous occasion on which I should have called 999 was when I was living in Finchley in north London. I was walking home from work at Demon late one Thursday evening (I think in November 1998) along Station Road when I heard a strange intermittent crashing noise coming closer. Soon a chav came into sight, holding an aluminium baseball bat. He waved it at me in a vaguely threatening manner as I passed him, but I ignored him and we continued on our ways. It soon became apparent what the noise was: several of the cars had had their windows or mirrors smashed, and I could hear a few more getting similar treatment behind me.

At this point I should have called 999 but instead I went home and called it in on the non-emergency number. The operator did not sound particularly interested, but took down the details. (I suppose it was around chucking-out time.)

On the following Saturday morning I got a call from the police who wanted to take a formal statement from me. It turned out that on Friday night bat-brain decided to try his toy out on someone's head. I was able to give a vague description of him, though I had only seen him for a few seconds by sodium lighting and hadn't looked too closely to avoid provoking him.

The case rumbled slowly on over the next several months. There were two attempts to organize an identification parade; the first one didn't happen because the necessary people weren't available. The second one was held in Brixton (IIRC), and I was driven there from Demon Towers and back by a police officer - it would have been faster, cheaper, and easier to take the tube. By this stage it was several months later and so I was unable to pick him out of the line-up, though some time later when the pressure was off (too late to be useful, though) I recognized him as the one with the chavvy dye job and earring. Also present was the victim and her man. I think her head was still not completely healed.

Some months after that I was contacted again and warned to be ready to be called as a witness in court. In the end I wasn't needed, I think because he pleaded guilty. I never found out what his sentence was.

The whole thing was very unsatisfactory: the slack response to my initial call; the lack of admonishment to call 999 in that kind of situation; the huge delays between arrest, id parade, and trial; and the lack of follow-up to tell me what the conclusion was. Cambridgeshire Constabulary seems much more on the ball than the Met.

Profile

fanf

dotat.at

January 2026

S	M	T	W	T	F	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Syndicate

Page Summary

Counting the days
Faster LIAR (Life in a register)
More postcodes
UK postcode regular expression
Birthday present
Job ad
Kaminsky
To do
selog-8.7.30 - selective logging library
ClamAV aargh
More Kaminsky
Kaminsky's DNS hack
A ratelimit idea
Murray Edwards College
Venti, Foundation, Bloom filters, and Erlang
Bike
Tech talk
Lotus Notes
Need a better name than post-postmodern
Password scanning
Dealing with phishers
floaty fiddling
TCP narg/rant
A bug in gcc
some ports are more equal than others
A little more Hashlife
Annoying things
invitation spam
ANNOUCE: selog - selective logging library
thread-aware libraries
DST is silly
leaky life
pointfree life
gzip vs. zlib
unifdef everywhere
unifdef
Hashlife
What goes around comes around
Endianness and C
What use are Bloom filters, anyway?
VoIP guru wanted
Still looking for a developer
More about Bloom filters
Some notes on Bloom filters
Barracuda are morons
LIAR: Life in a register
Pointer magic
Chain of distraction
Hot off the presses
An older 999 story

Style Credit

Style: Gray for Boxes and Borders by branchandroot

Expand Cut Tags

No cut tags

Page generated 2026-02-07 05:29

sm	= size / pop
pz	= exp(−nh ∗ pop / size)
	= exp(−nh / sm)
fpr	= (1 − pz) ^nh

ln(pz)	= −nh ∗ pop / size
nh	= −ln(pz) ∗ size / pop

ln(fpr)	= nh ∗ ln(1 − pz)
	= −(size / pop) ∗ ln(pz) ∗ ln(1 − pz)

nh	= ln(2) ∗ size / pop
size	= pop ∗ nh / ln(2)
	= pop ∗ nh ∗ log₂(e)
fpr	= exp(−nh ∗ ln(2))
	= 2^−nh