fanf | Recent Entries

My current project is to replace Cambridge University's DNS servers. The first stage of this project is to transfer the code from SCCS to Git so that it is easier to work with.

Ironically, to do this I have ended up spending lots of time working with SCCS and RCS, rather than Git. This was mainly developing analysis and conversion tools to get things into a fit state for Git.

If you find yourself in a similar situation, you might find these tools helpful.

Background

Cambridge was allocated three Class B networks in the 1980s: first the Computer Lab got 128.232.0.0/16 in 1987; then the Department of Engineering got 129.169.0.0/16 in 1988; and eventually the Computing Service got 131.111.0.0/16 in 1989 for the University (and related institutions) as a whole.

The oldest records I have found date from September 1990, which list about 300 registrations. The next two departments to get connected were the Statistical Laboratory and Molecular Biology (I can't say in which order). The Statslab was allocated 131.111.20.0/24, which it has kept for 24 years!. Things pick up in 1991, when the JANET IP Service was started and rapidly took over to replace X.25. (Last month I blogged about connectivity for Astronomy in Cambridge in 1991.)

I have found these historical nuggets in our ip-register directory tree. This contains the infrastructure and history of IP address and DNS registration in Cambridge going back a quarter century. But it isn't just an archive: it is a working system which has been in production that long. Because of this, converting the directory tree to Git presents certain challenges.

Developmestuction

The ip-register directory tree contains a mixture of:

Source code, mostly with SCCS history
Production scripts, mostly with SCCS history
Configuration files, mostly with SCCS history
The occasional executable
A few upstream perl libraries
Output files and other working files used by the production scripts
Secrets, such as private keys and passwords
Mail archives
Historical artifacts, such as old preserved copies of parts of the directory tree
Miscellaneous files without SCCS history
Editor backup files with ~ suffixes

My aim was to preserve this all as faithfully as I could, while converting it to Git in a way that represents the history in a useful manner.

PLN

The rough strategy was:

Take a copy of the ip-register directory tree, preserving modification times. (There is no need to preserve owners because any useful ownership information was lost when the directory tree moved off the Central Unix Service before that shut down in 2008.)
Convert from SCCS to RCS file-by-file. Converting between these formats is a simple one-to-one mapping.
Files without SCCS history will have very short artificial RCS histories created from their modification times and editor backup files.
Convert the RCS tree to CVS. This is basically just moving files around, because a CVS repository is little more than a directory tree of RCS files.
Convert the CVS repository to Git using git cvsimport. This is the only phase that needs to do cross-file history analysis, and other people have already produced a satisfactory solution.

Simples! ... Not.

sccs2rcs proves inadequate

I first tried ESR's sccs2rcs Python script. Unfortunately I rapidly ran into a number of showstoppers.

It didn't work with Solaris SCCS, which is what was available on the ip-register server.
It destructively updates the SCCS tree, losing information about the relationship between the working files and the SCCS files.
It works on a whole directory tree, so it doesn't give you file-by-file control.

I fixed a bug or two but very soon concluded the program was entirely the wrong shape.

(In the end, the Solaris incompatibility became moot when I installed GNU CSSC on my FreeBSD workstation to do the conversion. But the other problems with sccs2rcs remained.)

sccs2rcs1

So I wrote a small script called sccs2rcs1 which just converts one SCCS file to one RCS file, and gives you control over where the RCS and temporary files are placed. This meant that I would not have to shuffle RCS files around: I could just create them directly in the target CVS repository. Also, sccs2rcs1 uses RCS options to avoid the need to fiddle with checkout locks, which is a significant simplification.

The main regression compared to sccs2rcs is that sccs2rcs1 does not support branches, because I didn't have any files with branches.

sccscheck

At this point I needed to work out how I was going to co-ordinate the invocations of sccs2rcs1 to convert the whole tree. What was in there?!

I wrote a fairly quick-and-dirty script called sccscheck which analyses a directory tree and prints out notes on various features and anomalies. A significant proportion of the code exists to work out the relationship between working files, backup files, and SCCS files.

I could then start work on determining what fix-ups were necessary before the SCCS-to-CVS conversion.

sccsprefix

One notable part of the ip-register directory tree was the archive subdirectory, which contained lots of gzipped SCCS files with date stamps. What relationship did they have to each other? My first guess was that they might be successive snapshots of a growing history, and that the corresponding SCCS files in the working part of the tree would contain the whole history.

I wrote sccsprefix to verify if one SCCS file is a prefix of another, i.e. that it records the same history up to a certain point.

This proved that the files were NOT snapshots! In fact, the working SCCS files had been periodically moved to the archive, and new working SCCS files started from scratch. I guess this was to cope with the files getting uncomfortably large and slow for 1990s hardware.

rcsappend

So to represent the history properly in Git, I needed to combine a series of SCCS files into a linear history. It turns out to be easier to construct commits with artificial metadata (usernames, dates) with RCS than with SCCS, so I wrote rcsappend to add the commits from a newer RCS file as successors of commits in an older file.

Converting the archived SCCS files was then a combination of sccs2rcs1 and rcsappend. Unfortunately this was VERY slow, because RCS takes a long time to check out old revisions. This is because an RCS file contains a verbatim copy of the latest revision and a series of diffs going back one revision at a time. The SCCS format is more clever and so takes about the same time to check out any revision.

So I changed sccs2rcs1 to incorporate an append mode, and used that to convert and combine the archived SCCS files, as you can see in the ipreg-archive-uplift script. This still takes ages to convert and linearize nearly 20,000 revisions in the history of the hosts.131.111 file - an RCS checkin rewrites the entire RCS file so they get slower as the number of revisions grows. Fortunately I don't need to run it many times.

files2rcs

There are a lot of files in the ip-register tree without SCCS histories, which I wanted to preserve. Many of them have old editor backup ~ files, which could be used to construct a wee bit of history (in the absence of anything better). So I wrote files2rcs to build an RCS file from this kind of miscellanea.

An aside on file name restrictions

At this point I need to moan a bit.

Why does RCS object to file names that start with a comma. Why.

I tried running these scripts on my Mac at home. It mostly worked, except for the directories which contained files like DB.cam (source file) and db.cam (generated file). I added a bit of support in the scripts to cope with case-insensitive filesystems, so I can use my Macs for testing. But the bulk conversion runs very slowly, I think because it generates too much churn in the Spotlight indexes.

rcsdeadify

One significant problem is dealing with SCCS files whose working files have been deleted. In some SCCS workflows this is a normal state of affairs - see for instance the SCCS support in the POSIX Make XSI extensions. However, in the ip-register directory tree this corresponds to files that are no longer needed. Unfortunately the SCCS history generally does not record when the file was deleted. It might be possible to make a plausible guess from manual analysis, but perhaps it is more truthful to record an artificial revision saying the file was not present at the time of conversion.

Like SCCS, RCS does not have a way to represent a deleted file. CVS uses a convention on top of RCS: when a file is deleted it puts the RCS file in an "Attic" subdirectory and adds a revision with a "dead" status. The rcsdeadify applies this convention to an RCS file.

tar2usermap

There are situations where it is possible to identify a meaningful committer and deletion time. Where a .tar.gz archive exists, it records the original file owners. The tar2usermap script records the file owners from the tar files. The contents can then be unpacked and converted as if they were part of the main directory, using the usermap file to provide the correct committer IDs. After that the files can be marked as deleted at the time the tarfile was created.

sccs2cvs

The main conversion script is sccs2cvs, which evacuates an SCCS working tree into a CVS repository, leaving behind a tree of (mostly) empty directories. It is based on a simplified version of the analysis done by sccscheck, with more careful error checking of the commands it invokes. It uses sccs2rcs1, files2rcs, and rcsappend to handle each file.

The rcsappend case occurs when there is an editor backup ~ file which is older than the oldest SCCS revision, in which case sccs2cvs uses rcsappend to combine the output of sccs2rcs1 and files2rcs. This could be done more efficiently with sccs2rcs1's append mode, but for the ip-register tree it doesn't cause a big slowdown.

To cope with the varying semantics of missing working files, sccs2rcs leaves behind a tombstone where it expected to find a working file. This takes the form of a symlink pointing to 'Attic'. Another script can then deal with these tombstones as appropriate.

pre-uplift, mid-uplift, post-uplift

Before sccs2cvs can run, the SCCS working tree should be reasonably clean. So the overall uplift process goes through several phases:

Fetch and unpack copy of SCCS working tree;
pre-uplift fixups;
(These should be the minimum changes that are required before conversion to CVS, such as moving secrets out of the working tree.)
sccs2cvs;
mid-uplift fixups;
(This should include any adjustments to the earlier history such as marking when files were deleted in the past.)
git cvsimport or cvs-fast-export | git fast-import;
post-uplift fixups;
(This is when to delete cruft which is now preserved in the git history.)

For the ip-register directory tree, the pre-uplift phase also includes ipreg-archive-uplift which I described earlier. Then in the mid-uplift phase the combined histories are moved into the proper place in the CVS repository so that their history is recorded in the right place.

Similarly, for the tarballs, the pre-uplift phase unpacks them in place, and moves the tar files aside. Then the mid-uplift phase rcsdeadifies the tree that was inside the tarball.

I have not stuck to my guidelines very strictly: my scripts delete quite a lot of cruft in the pre-uplift phase. In particular, they delete duplicated SCCS history files from the archives, and working files which are generated by scripts.

sccscommitters

SCCS/RCS/CVS all record committers by simple user IDs, whereas git uses names and email addresses. So git-cvsimport and cvs-fast-export can be given an authors file containing the translation. The sccscommitters script produces a list of user IDs as a starting point for an authors file.

Uplifting cvs to git

At first I tried git cvsimport, since I have successfully used it before. In this case it turned out not to be the path to swift enlightenment - it was taking about 3s per commit. This is mainly because it checks out files from oldest to newest, so it falls foul of the same performance problem that my rcsappend program did, as I described above.

So I compiled cvs-fast-export and fairly soon I had a populated repository: nearly 30,000 commits at 35 commits per second, so about 100 times faster. The fast-import/export format allows you to provide file contents in any order, independent of the order they appear in commits. The fastest way to get the contents of each revision out of an RCS file is from newest to oldest, so that is what cvs-fast-export does.

There are a couple of niggles with cvs-fast-export, so I have a patch which fixes them in a fairly dumb manner (without adding command-line switches to control the behaviour):

In RCS and CVS style, cvs-fast-export replaces empty commit messages with "*** empty log message ***", whereas I want it to leave them empty.
cvs-fast-export makes a special effort to translate CVS's ignored file behaviour into git by synthesizing a .gitignore file into every commit. This is wrong for the ip-register tree.
Exporting the hosts.131.111 file takes a long time, during which cvs-fast-export appears to stall. I added a really bad progress meter to indicate that work was being performed.

Wrapping up

Overall this has taken more programming than I expected, and more time, very much following the pattern that the last 10% takes the same time as the first 90%. And I think the initial investigations - before I got stuck in to the conversion work - probably took the same time again.

There is one area where the conversion could perhaps be improved: the archived dumps of various subdirectories have been converted in the location that the tar files were stored. I have not tried to incorporate them as part of the history of the directories from which the tar files were made. On the whole I think combining them, coping with renames and so on, would take too much time for too little benefit. The multiple copies of various ancient scripts are a bit weird, but it is fairly clear from the git history what was going on.

So, let us declare the job DONE, and move on to building new DNS servers!

When I was running git-cvsimport yesterday (about which more another time), I wondered what were the nine-digit numbers starting with 7 that it was printing in its progress output. After a moment I realised they were the time_t values corresponding to the commit dates from the early 1990s.

Bert commented that he started using Unix when time_t values started with an 8, which made me wonder if there was perhaps a 26 year ambiguity - early 1970s or mid 1990s?. (For the sake of pedantry - I don't really think Bert is that old!)

So I checked and time_t = 80,000,000 corresponds to 1972-07-14 22:13:20 and 90,000,000 corresponds to 1972-11-07 16:00:00. But I thought this was before modern time_t started.

Page 183 of this very large PDF of the 3rd Edition Unix manual says:

TIME (II)                  3/15/72                  TIME (II)

NAME         time -- get time of year

SYNOPSIS     sys time / time = 13
             (time r0-r1)

DESCRIPTION  time returns the time since 00:00:00, Jan. 1,
             1972, measured in sixtieths of a second.  The
             high order word is in the rO register and the
             low order is in the r1.

SEE ALSO     date(I), mdate(II)

DIAGNOSTICS  --

BUGS         The time is stored in 32 bits.  This guarantees
             a crisis every 2.26 years.

So back then the 800,000,000 - 900,000,000 period was about three weeks in June 1972.

The 4th Edition Unix manual (link to tar file of nroff source) says:

TIME (II)                   8/5/73                   TIME (II)

NAME         time -- get date and time

SYNOPSIS     (time = 13)
             sys  time
             time(tvec)
             int tvec[2];

DESCRIPTION  Time returns the time since 00:00:00 GMT, Jan. 1,
             1970, measured in seconds. From asm, the high
             order word is in the r0 register and the low
             order is in r1. From C, the user-supplied vector
             is filled in.

SEE ALSO     date(I), stime(II), ctime(III)

DIAGNOSTICS  none

I think the date on that page is a reasonably accurate indicator of when the time_t format changed. In the Unix manual, each page has its own date, separate from the date on the published editions of the manual. So, for example, the 3rd Edition is dated February 1973, but its TIME(II) page is dated March 1972. However all the 4th Edition system call man pages have the same date, which suggests that part of the documentation was all revised together, and the actual changes to the code happened some time earlier.

Now, time_t = 100,000,000 corresponds to 1973-03-03 09:46:40, so it is pretty safe to say that the count of seconds since the epoch has always had nine or ten digits.

I recently saw FixedFixer on Hacker News. This is a bookmarklet which turns off CSS position:fixed, which makes a lot of websites less annoying. Particular offenders include Wired, Huffington Post, Medium, et cetera ad nauseam. A lot of them are unreadable on my old small phone because of all the crap they clutter up the screen with, but even on my new bigger phone the clutter is annoying. Medium's bottom bar is particularly vexing because it looks just like mobile Safari's bottom bar. Bah, thrice bah, and humbug! But, just run FixedFixer and the crap usually disappears.

The code I am using is very slightly adapted from the HN post. In readable form:

  (function(elements, elem, style, i) {
    elements = document.getElementsByTagName('*');
    for (i = 0; elem = elements[i]; i++) {
      style = getComputedStyle(elem);
      if (style && style.position == 'fixed')
        elem.style.position = 'static';
    }
  })();

Or in bookmarklet style:
javascript:(function(ee,e,s,i){ee=document.getElementsByTagName('*');for(i=0;e=ee[i];i++){s=getComputedStyle(e);if(s&&s.position=='fixed')e.style.position='static';}})();

Adding bookmarklets in iOS is a bit annoying because you can't edit the URL when adding a bookmark. You have to add a rubbish bookmark then as a separate step, edit it to replace the URL with the bookmarklet. Sigh.

I have since added a second bookmarklet, because when I was trying to read about AWS a large part of the text was off the right edge of the screen and they had disabled scrolling and zooming so it could not be read. How on earth can they publish a responsive website which does not actually work on a large number of phones?!

Anyway, OverflowFixer is the same as FixedFixer, but instead of changing position='fixed' to position='static', it changes overflow='hidden' to overflow='visible'.

When I mentioned these on Twitter, Tom said "Please share!" so this is a slightly belated reply. Do you have any bookmarklets that you particularly like? Write a comment!

I'm currently in the process of uplifting our DNS development / operations repository from SCCS (really!) to git. This is not entirely trivial because I want to ensure that all the archival material is retained in a sensible way.

I found an interesting document from one of the oldest parts of the archive, which provides a good snapshot of academic computer networking in the UK in 1991. It was written by Tony Stonely, aka <ajms@cam.ac.uk>. AJMS is mentioned in RFC 1117 as the contact for Cambridge's IP address allocation. He was my manager when I started work at Cambridge in 2002, though he retired later that year.

The document is an email discussing IP connectivity for Cambridge's Institute of Astronomy. There are a number of abbreviations which might not be familiar...

Coloured Book: the JANET protocol suite
CS: the University Computing Service
CUDN: the Cambridge University Data Network
GBN: the Granta Backbone Network, Cambridge's duct and fibre infrastructure
grey: short for Grey Book, the JANET email protocol
IoA: the Institute of Astronomy
JANET: the UK national academic network
JIPS: the JANET IP service, which started as a pilot service early in 1991; IP traffic rapidly overtook JANET's native X.25 traffic, and JIPS became an official service in November 1991, about when this message was written
PSH: a member of IoA staff
RA: ~~the Rutherford Appleton Laboratory, a national research institute in Oxfordshire~~ the Mullard Radio Astronomy Observatory, an outpost at Lords Bridge near Barton, where some of the dishes sit on the old Cambridge-Oxford railway line. (I originally misunderstood the reference.)
RGO: The Royal Greenwich Observatory, which moved from Herstmonceux to the IoA site in Cambridge in 1990
Starlink: a UK national DECnet network linking astronomical research institutions

Edited to correct the expansion of RA and to add Starlink

    Connection of IoA/RGO to IP world
    ---------------------------------

This note is a statement of where I believe we have got to and an initial
review of the options now open.

What we have achieved so far
----------------------------

All the Suns are properly connected at the lower levels to the
Cambridge IP network, to the national IP network (JIPS) and to the
international IP network (the Internet). This includes all the basic
infrastructure such as routing and name service, and allows the Suns
to use all the usual native Unix communications facilities (telnet,
ftp, rlogin etc) except mail, which is discussed below. Possibly the
most valuable end-user function thus delivered is the ability to fetch
files directly from the USA.

This also provides the basic infrastructure for other machines such as
the VMS hosts when they need it.

VMS nodes
---------

Nothing has yet been done about the VMS nodes. CAMV0 needs its address
changing, and both IOA0 and CAMV0 need routing set for extra-site
communication. The immediate intention is to route through cast0. This
will be transparent to all parties and impose negligible load on
cast0, but requires the "doit" bit to be set in cast0's kernel. We
understand that PSH is going to do all this [check], but we remain
available to assist as required.

Further action on the VMS front is stalled pending the arrival of the
new release (6.6) of the CMU TCP/IP package. This is so imminent that
it seems foolish not to await it, and we believe IoA/RGO agree [check].

Access from Suns to Coloured Book world
---------------------------------------

There are basically two options for connecting the Suns to the JANET
Coloured Book world. We can either set up one or more of the Suns as
full-blown independent JANET hosts or we can set them up to use CS
gateway facilities. The former provides the full range of facilities
expected of any JANET host, but is cumbersome, takes significant local
resources, is complicated and long-winded to arrange, incurs a small
licence fee, is platform-specific, and adds significant complexity to
the system managers' maintenance and planning load. The latter in
contrast is light-weight, free, easy to install, and can be provided
for any reasonable Unix host, but limits functionality to outbound pad
and file transfer either way initiated from the local (IoA/RGO) end.
The two options are not exclusive.

We suspect that the latter option ("spad/cpf") will provide adequate
functionality and is preferable, but would welcome IoA/RGO opinion.

Direct login to the Suns from a (possibly) remote JANET/CUDN terminal
would currently require the full Coloured Book package, but the CS
will shortly be providing X.29-telnet gateway facilities as part of
the general infrastructure, and can in any case provide this
functionality indirectly through login accounts on Central Unix
facilities. For that matter, AST-STAR or WEST.AST could be used in
this fashion.

Mail
----

Mail is a complicated and difficult subject, and I believe that a
small group of experts from IoA/RGO and the CS should meet to discuss
the requirements and options. The rest of this section is merely a
fleeting summary of some of the issues.
Firstly, a political point must be clarified. At the time of writing
it is absolutely forbidden to emit smtp (ie Unix/Internet style) mail
into JIPS. This prohibition is national, and none of Cambridge's
doing. We expect that the embargo will shortly be lifted somewhat, but
there are certain to remain very strict rules about how smtp is to be
used. Within Cambridge we are making best guesses as to the likely
future rules and adopting those as current working practice. It must
be understood however that the situation is highly volatile and that
today's decisions may turn out to be wrong.

The current rulings are (inter alia)

        Mail to/from outside Cambridge may only be grey (Ie. JANET
        style).

        Mail within Cambridge may be grey or smtp BUT the reply
        address MUST be valid in BOTH the Internet AND Janet (modulo
        reversal). Thus a workstation emitting smtp mail must ensure
        that the reply address contained is that of a current JANET
        mail host. Except that -

        Consenting machines in a closed workgroup in Cambridge are
        permitted to use smtp between themselves, though there is no
        support from the CS and the practice is discouraged. They
        must remember not to contravene the previous two rulings, on
        pain of disconnection.

The good news is that a central mail hub/distributer will become
available as a network service for the whole University within a few
months, and will provide sufficient gateway function that ordinary
smtp Unix workstations, with some careful configuration, can have full
mail connectivity. In essence the workstation and the distributer will
form one of those "closed workgroups", the workstation will send all
its outbound mail to the distributer and receive all its inbound mail
from the distributer, and the distributer will handle the forwarding
to and from the rest of Cambridge, UK and the world.

There is no prospect of DECnet mail being supported generally either
nationally or within Cambridge, but I imagine Starlink/IoA/RGO will
continue to use it for the time being, and whatever gateway function
there is now will need preserving. This will have to be largely
IoA/RGO's own responsibility, but the planning exercise may have to
take account of any further constraints thus imposed. Input from
IoA/RGO as to the requirements is needed.

In the longer term there will probably be a general UK and worldwide
shift to X.400 mail, but that horizon is probably too hazy to rate more
than a nod at present. The central mail switch should in any case hide
the initial impact from most users.

The times are therefore a'changing rather rapidly, and some pragmatism
is needed in deciding what to do. If mail to/from the IP machines is
not an urgent requirement, and since they will be able to log in to
the VMS nodes it may not be, then the best thing may well be to await
the mail distributer service. If more direct mail is needed more
urgently then we probably need to set up a private mail distributer
service within IoA/RGO. This would entail setting up (probably) a Sun
as a full JANET host and using it as the one and only (mail) route in
or out of IoA/RGO. Something rather similar has been done in Molecular
Biology and is thus known to work, but setting it up is no mean task.
A further fall-back option might be to arrange to use Central Unix
facilities as a mail gateway in similar vein. The less effort spent on
interim facilities the better, however.

Broken mail
-----------

We discovered late in the day that smtp mail was in fact being used
between IoA and RA, and the name changing broke this. We regret having
thus trodden on existing facilities, and are willing to help try to
recover any required functionality, but we believe that IoA/RGO/RA in
fact have this in hand. We consider the activity to fall under the
third rule above. If help is needed, please let us know.

We should also report sideline problem we encountered and which will
probably be a continuing cause of grief. CAVAD, and indeed any similar
VMS system, emits mail with reply addresses of the form
"CAVAD::user"@....  This is quite legal, but the quotes are
syntactically significant, and must be returned in any reply.
Unfortunately the great majority of Unix systems strip such quotes
during emission of mail, so the reply address fails. Such stripping
can occur at several levels, notably the sendmail (ie system)
processing and the one of the most popular user-level mailers. The CS
is fixing its own systems, but the problem is replicated in something
like half a million independent Internet hosts, and little can be done
about it.

Other requirements
------------------

There may well be other requirements that have not been noticed or,
perish the thought, we have inadvertently broken. Please let us know
of these.

Bandwidth improvements
----------------------

At present all IP communications between IoA/RGO and the rest of the
world go down a rather slow (64Kb/sec) link. This should improve
substantially when it is replaced with a GBN link, and to most of
Cambridge the bandwidth will probably become 1-2Mb/sec. For comparison,
the basic ethernet bandwidth is 10Mb/sec. The timescale is unclear, but
sometime in 1992 is expected. The bandwidth of the national backbone
facilities is of the order of 1Mb/sec, but of course this is shared with
many institutions in a manner hard to predict or assess.

For Computing Service,
Tony Stoneley, ajms@cam.cus
29/11/91

A lot of my day has been spent on the POODLE vulnerability. For details see the original paper, commentary by Daniel Franke, Adam Langley, Robert Graham, and the POODLE.io web page of stats and recommendations.

One thing I have been investigating is to what extent mail software uses SSLv3. The best stats we have come from our message submission server, smtp.hermes, which logs TLS versions and cipher suites and (when possible) User-Agent and X-Mailer headers. (The logging from our POP and IMAP servers is not so good, partly because we don't request or log user agent declarations, and even if we did most clients wouldn't provide them.)

Nearly 100 of our users are using SSLv3, which is about 0.5% of them. The main culprits seem to be Airmail, Evolution, and most of all Android. Airmail is a modern Mac MUA, so in that case I guess it is a bug or misuse of the TLS API. For Evolution my guess is that it has a terrible setup user interface (all MUAs have terrible setup user interfaces) and users are choosing "SSL 3.0" rather than "TLS 1.0" because the number is bigger. In the case of Android I don't have details of version numbers because Android mail software doesn't include user-agent headers (unlike practically everything else), but I suspect old unsupported smart-ish phones running bad Java are to blame.

I haven't decided exactly what we will do to these users yet. However we have the advantage that POODLE seems to be a lot less bad for non-web TLS clients.

The POODLE padding oracle attack requires a certain amount of control over the plaintext which the attacker is trying to decrypt. Specifically:

The plaintext plus MAC has to be an exact multiple of the cipher block size;
It must be possible to move the secret (cookie or password) embedded in the plaintext by a byte at a time to scan it past a block boundary.

In the web situation, the attacker can use JavaScript served from anywhere to make repeated POST requests to an arbitrary target host. The JS can manipulate the body of the POST to control the overall length of the request, and can manipulate the request path to control the position of the cookie in the headers.

In the mail situation (POP, IMAP, SMTP), the attacker can make the client retry requests repeatedly by breaking the connection, but they cannot control the size or framing of the client's authentication command.

So I think we have the option of not worrying too much if forced upgrades turn out to be too painful, though I would prefer not to go that route - it makes me feel uncomfortably complacent.

"Bad programmers worry about the code. Good programmers worry about data structures and their relationships." - Linus Torvalds

"If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming." - Rob Pike

"Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious." - Fred Brooks

I just noticed that Dilbert had its 25th anniversary last month. I have created an atom feed of old strips to mark this event. To avoid leap year pain the feed contains strips from the same date 24 years ago, rather than 25 years ago. See

dilbert_zoom for current strips and

dilbert_24 for the old ones.

I have updated yesterday's article on how to get SSHFP records, DNSSEC, and VerifyHostKeyDNS=yes to work.

I have re-ordered the sections to avoid interrupting the flow of the instructions with chunks of background discussion.

I have also added a section discussing the improved usability vs weakened security of the RRSET_FORCE_EDNS0 patch in Debian and Ubuntu.

One of the great promises of DNSSEC is to provide a new public key infrastructure for authenticating Internet services. If you are a relatively technical person you can try out this brave new future now with ssh.

There are a couple of advantages of getting SSHFP host authentication to work. Firstly you get easier-to-use security, since you longer need to rely on manual host authentication, or better security for the brave or foolhardy who trust leap-of-faith authentication. Secondly, it becomes feasible to do host key rollovers, since you only need to update the DNS - the host's key is no longer wired into thousands of known_hosts files. (You can probably also get the latter benefit using ssh certificate authentication, but why set up another PKI if you already have one?)

In principle it should be easy to get this working but there are a surprising number of traps and pitfalls. So in this article I am going to try to explain all the whys and wherefores, which unfortunately means it is going to be long, but I will try to make it easy to navigate. In the initial version of this article I am just going to describe what the software does by default, but I am happy to add specific details and changes made by particular operating systems.

The outline of what you need to do on the server is:

Sign your DNS zone. I will not cover that in this article.
Publish SSHFP records in the DNS

The client side is more involved. There are two versions, depending on whether ssh has been compiled to use ldns or not. Run ldd $(which ssh) to see if it is linked with libldns.

Without ldns:
- Install a validating resolver (BIND or Unbound)
- Configure the stub resolver /etc/resolv.conf
- Configure ssh
With ldns:
- Install unbound-anchor
- Configure the stub resolver /etc/resolv.conf
- Configure ssh

Publish SSHFP records in the DNS

Generating SSHFP records is quite straightforward:

    demo:~# cd /etc/ssh
    demo:/etc/ssh# ssh-keygen -r $(hostname)
    demo IN SSHFP 1 1 21da0404294d07b940a1df0e2d7c07116f1494f9
    demo IN SSHFP 1 2 3293d4c839bfbea1f2d79ab1b22f0c9e0adbdaeec80fa1c0879dcf084b72e206
    demo IN SSHFP 2 1 af673b7beddd724d68ce6b2bb8be733a4d073cc0
    demo IN SSHFP 2 2 953f24d775f64ff21f52f9cbcbad9e981303c7987a1474df59cbbc4a9af83f6b
    demo IN SSHFP 3 1 f8539cfa09247eb6821c645970b2aee2c5506a61
    demo IN SSHFP 3 2 9cf9ace240c8f8052f0a6a5df1dea4ed003c0f5ecb441fa2c863034fddd37dc9

Put these records in your zone file, or you can convert them into an nsupdate script with a bit of seddery:

    ssh-keygen -r $(hostname -f) |
        sed 's/^/update add /;s/ IN / 3600 IN /;/ SSHFP . 1 /d;'

The output of ssh-keygen -r includes hashes in both SHA1 and SHA256 format (the shorter and longer hashes). You can discard the SHA1 hashes.

It includes hashes for the different host key authentication algorithms:

1: ssh-rsa
2: ssh-dss
3: ecdsa

I believe ecdsa covers all three key sizes which OpenSSH gives separate algorithm names: ecdsa-sha2-nistp256, ecdsa-sha2-nistp384, ecdsa-sha2-nistp521

OpenSSH supports other host key authentication algorithms, but unfortunately they cannot be authenticated using SSHFP records because they do not have algorithm numbers allocated.

The problem is actually worse than that, because most of the extra algorithms are the same as the three listed above, but with added support for certificate authentication. The ssh client is able to convert certificate to plain key authentication, but a bug in this fallback logic breaks SSHFP authentication.

So I recommend that if you want to use SSHFP authentication your server should only have host keys of the three basic algorithms listed above.

NOTE (added 26-Nov-2014): if you are running OpenSSH-5, it does not support ECDSA SSHFP records. So if your servers need to support older clients, you might want to stick to just RSA and DSA host keys.

You are likely to have an SSHFP algorithm compatibility problem if you get the message: "Error calculating host key fingerprint."

NOTE (added 12-Mar-2015): if you are running OpenSSH-6.7 or later, it has support for Ed25519 SSHFP records which use algorithm number 4.

Install a validating resolver

To be safe against active network interception attacks you need to do DNSSEC validation on the same machine as your ssh client. If you don't do this, you can still use SSHFP records to provide a marginal safety improvement for leap-of-faith users. In this case I recommend using VerifyHostKeyDNS=ask to reinforce to the user that they ought to be doing proper manual host authentication.

If ssh is not compiled to use ldns then you need to run a local validating resolver, either BIND or unbound.

If ssh is compiled to use ldns, it can do its own validation, and you do not need to install BIND or Unbound.

Run ldd $(which ssh) to see if it is linked with libldns.

Install a validating resolver - BIND

The following configuration will make named run as a local validating recursive server. It just takes the defaults for everything, apart from turning on validation. It automatically uses BIND's built-in copy of the root trust anchor.

/etc/named/named.conf

    options {
        dnssec-validation auto;
        dnssec-lookaside auto;
    };

Install a validating resolver - Unbound

Unbound comes with a utility unbound-anchor which sets up the root trust anchor for use by the unbound daemon. You can then configure unbound as follows, which takes the defaults for everything apart from turning on validation using the trust anchor managed by unbound-anchor.

/etc/unbound/unbound.conf

    server:
        auto-trust-anchor-file: "/var/lib/unbound/root.key"

Install a validating resolver - dnssec-trigger

If your machine moves around a lot to dodgy WiFi hot spots and hotel Internet connections, you may find that the nasty middleboxes break your ability to validate DNSSEC. In that case you can use dnssec-trigger, which is a wrapper around Unbound which knows how to update its configuration when you connect to different networks, and which can work around braindamaged DNS proxies.

Configure the stub resolver - without ldns

If ssh is compiled without ldns, you need to add the following line to /etc/resolv.conf; beware your system's automatic resolver configuration software, which might be difficult to persuade to leave resolv.conf alone.

    options edns0

For testing purposes you can add RES_OPTIONS=edns0 to ssh's environment.

On some systems (including Debian and Ubuntu), ssh is patched to force EDNS0 on, so that you do not need to set this option. See the section on RRSET_FORCE_EDNS0 below for further discussion.

Configure the stub resolver - with ldns

If ssh is compiled with ldns, you need to run unbound-anchor to maintain a root trust anchor, and add something like the following line to /etc/resolv.conf

    anchor /var/lib/unbound/root.key

Run ldd $(which ssh) to see if it is linked with libldns.

Configure ssh

After you have done all of the above, you can add the following to your ssh configuration, either /etc/ssh/ssh_config or ~/.ssh/config

    VerifyHostKeyDNS yes

Then when you connect to a host for the first time, it should go straight to the Password: prompt, without asking for manual host authtentication.

If you are not using certificate authentication, you might also want to disable that. This is because ssh prefers the certificate authentication algorithms, and if you connect to a host that offers a more preferred algorithm, ssh will try that and ignore the DNS. This is not very satisfactory; hopefully it will improve when the bug is fixed.

    HostKeyAlgorithms ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,ssh-rsa,ssh-dss

Troubleshooting

Check that your resolver is validating and getting a secure result for your host. Run the following command and check for "ad" in the flags. If it is not there then either your resolver is not validating, or /etc/resolv.conf is not pointing at the validating resolver.

    $ dig +dnssec <hostname> sshfp | grep flags
    ;; flags: qr rd ra ad; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1
    ; EDNS: version: 0, flags: do; udp: 4096

See if ssh is seeing the AD bit. Use ssh -v and look for messages about secure or insecure fingerprints in the DNS. If you are getting secure answers via dig but ssh is not, perhaps you are missing "options edns0" from /etc/resolv.conf.

    debug1: found 6 secure fingerprints in DNS
    debug1: matching host key fingerprint found in DNS

Try using specific host key algorithms, to see if ssh is trying to authenticate a key which does not have an SSHFP record of the corresponding algorithm.

    $ ssh -o HostKeyAlgorithms=ssh-rsa <hostname>

Summary

What I use is essentially:

/etc/named/named.conf

    options {
        dnssec-validation auto;
        dnssec-lookaside auto;
    };

/etc/resolv.conf

    nameserver 127.0.0.1
    options edns0

/etc/ssh/ssh_config

    VerifyHostKeyDNS yes

Background on DNSSEC and non-validating stub resolvers

When ssh is not compiled to use ldns, it has to trust the recursive DNS server to validate SSHFP records, and it trusts that the connection to the recursive server is secure. To find out if an SSHFP record is securely validated, ssh looks at the AD bit in the DNS response header - AD stands for "authenticated data".

A resolver will not set the AD bit based on the security status of the answer unless the client asks for it. There are two ways to do that. The simple way (from the perspective of the DNS protocol) is to set the AD bit in the query, which gets you the AD bit in the reply without other side-effects. Unfortunately the standard resolver API makes it very hard to do this, so it is only simple in theory.

The other way is to add an EDNS0 OPT record to the query with the DO bit set - DO stands for "DNSSEC OK". This has a number of side-effects: EDNS0 allows large UDP packets which provide the extra space needed by DNSSEC, and DO makes the server send back the extra records required by DNSSEC such as RRSIGs.

Adding "options edns0" to /etc/resolv.conf only tells it to add the EDNS0 OPT record - it does not enable DNSSEC. However ssh itself observes whether EDNS0 is turned on, and if so also turns on the DO bit.

Regarding "options edns0" vs RRSET_FORCE_EDNS0

At first it might seem annoying that ssh makes you add "options edns0" to /etc/resolv.conf before it will ask for DNSSEC results. In fact on some systems, ssh is patched to add a DNS API flag called RRSET_FORCE_EDNS0 which forces EDNS0 and DO on, so that you do not need to explicitly configure the stub resolver. However although this seems more convenient, it is less safe.

If you are using the standard portable OpenSSH then you can safely set VerifyHostKeyDNS=yes, provided your stub resolver is configured correctly. The rule you must follow is to only add "options edns0" if /etc/resolv.conf is pointing at a local validating resolver. SSH is effectively treating "options edns0" as a signal that it can trust the resolver. If you keep this rule you can change your resolver configuration without having to reconfigure ssh too; it will automatically fall back to VerifyHostKeyDNS=ask when appropriate.

If you are using a version of ssh with the RRSET_FORCE_EDNS0 patch (such as Debian and Ubuntu) then it is sometimes NOT SAFE to set VerifyHostKeyDNS=yes. With this patch ssh has no way to tell if the resolver is trustworthy or if it should fall back to VerifyHostKeyDNS=ask; it will blindly trust a remote validating resolver, which leaves you vulnerable to MitM attacks. On these systems, if you reconfigure your resolver, you may also have to reconfigure ssh in order to remain safe.

Towards the end of February there was a discussion on an IETF list about stub resolvers and DNSSEC which revolved around exactly this question of how an app can tell if it is safe to trust the AD bit from the recursive DNS server.

One proposal was for the stub resolver to strip the AD bit in replies from untrusted servers, which (if it were implemented) would allow ssh to use the RRSET_FORCE_EDNS0 patch safely. However this proposal means you have to tell the resolver if the server is trusted, which might undo the patch's improved convenience. There are ways to avoid that, such as automatically trusting resolvers running on the local host, and perhaps having a separate configuration file listing trusted resolvers, e.g. those reachable over IPSEC.

Compare Wikipedia's table of the relative frequencies of initial letters in English.

$ dig axfr . @f.root-servers.net |
  perl -ne '
	next unless /^(([a-z])[a-z0-9-]+)[.][  ].*/;
	$label{$1} = 1; $letter{$2}++; $total++;
	END {
		for my $x (sort keys %letter) {
			my $p = 100.0*$letter{$x}/$total;
			printf "<tr><td>$x</td>
				<td align=right>%5.2f</td>
				<td><span style=\"
					display: inline-block;
					background-color: gray;
					width: %d;\">
				&nbsp;</span></td></tr>\n",
			    $p, 32*$p;
		}
	}'

a	5.24
b	5.96
c	10.02
d	2.45
e	3.28
f	2.43
g	5.03
h	1.78
i	3.38
j	1.14
k	2.71
l	3.74
m	6.69
n	4.26
o	0.72
p	5.68
q	0.65
r	2.81
s	6.15
t	6.02
u	1.91
v	2.66
w	1.94
x	12.19
y	0.41
z	0.75

On my toy nameserver my master zones are configured with a directory for each zone. In this directory is a "conf" file which is included by the nameserver's main configuration file; a "master" file containing the zone data in the raw binary format; a "journal" file recording changes to the zone from dynamic UPDATEs and re-signing; and DNSSEC and TSIG keys. The "conf" file looks something like this:

    zone dotat.at {
        type master;
        file "/zd/dotat.at/master";
        journal "/zd/dotat.at/journal";
        key-directory "/zd/dotat.at";
        masterfile-format raw;
        auto-dnssec maintain;
        update-policy local;
    };

I must have been having a fit of excessive tidyness when I was setting this up, because although it looks quite neat, the unusual file names cause some irritation - particularly for the journal. Some of the other BIND tools assume that journal filenames are the same as the master file with a .jnl extension.

I keep the name server configuration in git. This is a bit awkward because the configuration contains precious secrets (DNSSEC private keys), and the zone files are constantly-changing binary data. But it is useful for recording manual changes, since the zone files don't have comments explaining their contents. I don't make any effort to record the re-signing churn, though I commit it when making other changes.

To reduce the awkwardness I configured git to convert zone files to plain text when diffing them, so I had a more useful view of the repository. There are three parts to setting this up.

Tell git that the zone files require a special diff driver, which I gave the name "bind-raw".
All the zone files in the repository are called "master", so in the .gitattributes file at the top of the repository I have the line
```
    master diff=bind-raw
   
```
The diff driver is part of the repository configuration. (Because the implementation of the driver is a command it isn't safe to set it up automatically in a repository clone, as is the case for .gitattributes settings, so it has to be configured separately.) So add the following lines to .git/config
```
    [diff "bind-raw"]
	textconv = etc/raw-to-text
   
```
The final part is the raw-to-text script, which lives in the repository.

This is where journal file names get irritating. You can convert a raw master file into the standard text format with named-compilezone, which has a -j option to read the zone's journal, but this assumes that the journal has the default file name with the .jnl extension. So it doesn't quite work in my setup.

(It also doesn't quite work on the University's name servers which have a directory for master files and a directory for journal files.)

So in September 2012 I patched BIND to add a -J option to named-compilezone for specifying the journal file name. I have a number of other small patches to my installation of BIND, and this one was very simple, so in it went. (It would have been much more sensible to change my nameserver configuration to go with the flow...)

The patch allowed me to write the raw-to-text script as follows. The script runs named-compilezone twice: the first time with a bogus zone name, which causes named-compilezone to choke with an error message. This helpfully contains the real zone name, which the script extracts then uses to invoke named-compilezone correctly.

    #!/bin/sh
    file="$*"
    command="named-compilezone -f raw -F text -J journal -o /dev/stdout"
    zone="$($command . "$file" 2>&1)"
    zone="${zone#*: }"
    zone="${zone%%:*}"
    $command "$zone" "$file" 2>/dev/null

I submitted the -J patch to the ISC and got a favourable response from Evan Hunt. At that time (16 months ago) BIND was at version 9.9.2; since this option was a new feature (and unimportant) it was added to the 9.10 branch. Wind forward 14 months to November 2013 and the first alpha release of 9.10 came out, with the -J option, so I was able to retire my patch. Woo! There will be a beta release of 9.10 in a few weeks.

In truth, you don't need BIND 9.10 to use this git trick, if you use the default journal file names. The key thing to make it simple is to give all your master files the same name, so that you don't have to list them all in your .gitattributes.

The central recursive DNS servers in Cambridge act as stealth slaves for most of our local zones, and we recommend this configuration for other local DNS resolvers. This has the slightly odd effect that the status bits in answers have AD (authenticated data) set for most DNSSEC signed zones, except for our local ones which have AA (authoritative answer) set. This is not a very big deal since client hosts should do their own DNSSEC validation and ignore any AD bits they get over the wire.

It is a bit more of a problem for the toy nameserver I run on my workstation. As well as being my validating resolver, it is also the master for my personal zones, and it slaves some of the Cambridge zones. This mixed recursive / authoritative setup is not really following modern best practices, but it's OK when I am the only user, and it makes experimental playing around easier. Still, I wanted it to validate answers from its authoritative zones, especially because there's no security on the slave zone transfers.

I had been procrastinating this change because I thought the result would be complicated and ugly. But last week one of the BIND developers, Mark Andrews, posted a description of how to validate slaved zones to the dns-operations list, and it turned out to be reasonably OK - no need to mess around with special TSIG keys to get queries from one view to another.

The basic idea is to have one view that handles recursive queries and which validates all its answers, and another view that holds the authoritative zones and which only answers non-recursive queries. The recursive view has "static-stub" zone configurations mirroring all of the zones in the authoritative view, to redirect queries to the local copies.

Here's a simplified version of the configuration I tried out. To make it less annoying to maintain, I wrote a script to automatically generate the static-stub configurations from the authoritative zones.

  view rec {
    match-recursive-only yes;
    zone cam.ac.uk         { type static-stub; server-addresses { ::1; }; };
    zone private.cam.ac.uk { type static-stub; server-addresses { ::1; }; };
  };

  view auth {
    recursion no;
    allow-recursion { none; };
    zone cam.ac.uk         { type slave; file "cam";  masters { ucam; }; };
    zone private.cam.ac.uk { type slave; file "priv"; masters { ucam; }; };
  };

This seemed to work fine, until I tried to resolve names in private.cam.ac.uk - then I got a server failure. In my logs was the following (which I have slightly abbreviated):

  client ::1#55687 view rec: query: private.cam.ac.uk IN A +E (::1)
  client ::1#60344 view auth: query: private.cam.ac.uk IN A -ED (::1)
  client ::1#54319 view auth: query: private.cam.ac.uk IN DS -ED (::1)
  resolver: DNS format error from ::1#53 resolving private.cam.ac.uk/DS:
    Name cam.ac.uk (SOA) not subdomain of zone private.cam.ac.uk -- invalid response
  lame-servers: error (FORMERR) resolving 'private.cam.ac.uk/DS/IN': ::1#53
  lame-servers: error (no valid DS) resolving 'private.cam.ac.uk/A/IN': ::1#53
  query-errors: client ::1#55687 view rec:
    query failed (SERVFAIL) for private.cam.ac.uk/IN/A at query.c:7435

You can see the original recursive query that I made, then the resolver querying the authoritative view to get the answer and validate it. The situation here is that private.cam.ac.uk is an unsigned zone, so a DNSSEC validator has to check its delegation in the parent zone cam.ac.uk and get a proof that there is no DS record, to confirm that it is OK for private.cam.ac.uk to be unsigned. Something is going wrong with BIND's attempt to get this proof of nonexistence.

When BIND gets a non-answer it has to classify it as a referral to another zone or an authoritative negative answer, as described in RFC 2308 section 2.2. It is quite strict in its sanity checks, in particular it checks that the SOA record refers to the expected zone. This check often discovers problems with misconfigured DNS load balancers which are given a delegation for www.example.com but which think their zone is example.com, leading them to hand out malformed negative responses to AAAA queries.

This negative answer SOA sanity check is what failed in the above log extract. Very strange - the resolver seems to be looking for the private.cam.ac.uk DS record in the private.cam.ac.uk zone, not the cam.ac.uk zone, so when it gets an answer from the cam.ac.uk zone it all goes wrong. Why is it looking in the wrong place?

In fact the same problem occurs for the cam.ac.uk zone itself, but in this case the bug turns out to be benign:

  client ::1#16276 view rec: query: cam.ac.uk IN A +E (::1)
  client ::1#65502 view auth: query: cam.ac.uk IN A -ED (::1)
  client ::1#61409 view auth: query: cam.ac.uk IN DNSKEY -ED (::1)
  client ::1#51380 view auth: query: cam.ac.uk IN DS -ED (::1)
  security: client ::1#51380 view auth: query (cache) 'cam.ac.uk/DS/IN' denied
  lame-servers: error (chase DS servers) resolving 'cam.ac.uk/DS/IN': ::1#53

You can see my original recursive query, and the resolver querying the authoritative view to get the answer and validate it. But it sends the DS query to itself, not to the name servers for the ac.uk zone. When this query fails, BIND re-tries by working down the delegation chain from the root, and this succeeds so the overall query and validation works despite tripping up.

This bug is not specific to the weird two-view setup. If I revert to my old configuration, without views, and just slaving cam.ac.uk and private.cam.ac.uk, I can trigger the benign version of the bug by directly querying for the cam.ac.uk DS record:

  client ::1#30447 (cam.ac.uk): query: cam.ac.uk IN DS +E (::1)
  lame-servers: error (chase DS servers) resolving 'cam.ac.uk/DS/IN': 128.232.0.18#53

In this case the resolver sent the upstream DS query to one of the authoritative servers for cam.ac.uk, and got a negative response from the cam.ac.uk zone apex per RFC 4035 section 3.1.4.1. This did not fail the SOA sanity check but it did trigger the fall-back walk down the delegation chain.

In the simple slave setup, queries for private.cam.ac.uk do not fail because they are answered from authoritative data without going through the resolver. If you change the zone configurations from slave to stub or static-stub then the resolver is used to answer queries for names in those zones, and so queries for private.cam.ac.uk explode messily as BIND tries really hard (128 times!) to get a DS record from all the available name servers but keeps checking the wrong zone.

I spent some time debugging this on Friday evening, which mainly involved adding lots of logging statements to BIND's resolver to work out what it thought it was doing. Much confusion and headscratching and eventually understanding.

BIND has some functions called findzonecut() which take an option to determine whether it wants the child zone or the parent zone. This works OK for dns_db_findzonecut() which looks in the cache, but dns_view_findzonecut() gets it wrong. This function works out whether to look for the name in a locally-configured zone, and if so which one, or otherwise in the cache, or otherwise work down from the root hints. In the case of a locally-configured zone it ignores the option and always returns the child side of the zone cut. This causes the resolver to look for DS records in the wrong place, hence all the breakage described above.

I worked out a patch to fix this DS record resolution problem, and I have sent details of the bug and my fix to bind9-bugs@isc.org. And I now have a name server that correctly validates its authoritative zones :-)

Imagine that...

Secure NTP is an easy-to-use and universally deployed protocol extension...

The NTP pool is dedicated to providing accurate time from anyone to everyone, securely...

NIST, creators of some of the world's best clocks and keepers of official time for the USA, decide that the NTP pool is an excellent project which they would like to help. They donatate machines and install them around the world and dedicate them to providing time as part of the NTP pool. Their generous funding allows them to become a large and particularly well-connected proportion of the pool.

In fact NIST is a sock-puppet of the NSA. Their time servers are modified so that they are as truthful and as accurate as possible to everyone, except those who the US government decides they do not like.

The NSA has set up a system dedicated to replay attacks. They cause occasional minor outages and screwups in various cryptographic systems - certificate authorities, DNS registries - which seem to be brief and benign when they happen, but no-one notices that the bogus invalid certificates and DNS records all have validity periods covering a particular point in time.

Now the NSA can perform a targeted attack, in which they persuade the victim to reboot, perhaps out of desperation because nothing works and they don't understand denial-of-service attacks. The victim's machine reboots, and it tries to get the time from the NTP pool. The NIST sock-puppet servers all lie to it. The victim's machine believes the time is in the NSA replay attack window. It trustingly fetches some crucial "software update" from a specially-provisioned malware server, which both its DNSSEC and X.509 PKIs say is absolutely kosher. It becomes comprehensively pwned by the NSA.

How can we provide the time in a way that is secure against this attack?

(Previously, previously)

The security of temporum is based on the idea that you can convince yourself that several different sources agree on what the time is, with the emphasis on different. Where are the weaknesses in the way it determines if sources are different?

The starting point for temporum is a list of host names to try. It is OK if lots of them fail (e.g. because your device has been switched off on a shelf for years) provided you have a good chance of eventually getting a quorum.

The list of host names is very large, and temporum selects candidates from the list at random. This makes it hard for an attacker to target the particular infrastructure that temporum might use. I hope your device is able to produce decent random numbers immediately after booting!

The list of host names is statically configured. This is important to thwart Sybil attacks: you don't want an attacker to convince you to try a list of apparently-different host names which are all under the attacker's control. Question: can the host list be made dynamic without making it vulnerable?

Hostnames are turned into IP addresses using the DNS. Temporum uses the TLS X.509 PKI to give some assurance that the DNS returned the correct result, about which more below. The DNS isn't security-critical, but if it worries you perhaps temporum could be configured with a list of IP addresses instead - but maybe that will make the device-on-shelf less likely to boot successfully.

Temporum does not compare the IP addresses of "different" host names. This might become a problem once TLS SNI makes large-scale virtual hosting easier. More subtly, there is a risk that temporum happens to query lots of servers that are hosted on the same infrastructure. This can be mitigated by being careful about selecting which host names to include in the list - no more than a few each of Blogspot, Tumblr, Livejournal, GoDaddy vhosts, etc. More than one of each is OK since it helps with on-shelf robustness.

The TLS security model hopes that X.509 certification authorities will only hand out certificates for host names to the organizations that run the hosts. This is a forlorn hope: CAs have had their infrastructure completely compromised; they have handed out intermediate signing certificates to uncontrolled third parties; they are controlled by nation states that treat our information security with contempt.

In the context of temporum, we can reduce this problem by checking that the quorum hosts are authenticated by diverse CAs. Then an attacker would have to compromise multiple CAs to convince us of an incorrect time. Question: are there enough different CAs used by popular sites that temporum can quickly find a usable set?

I have done a little bit of work on nsdiff recently.

You can now explicitly manage your DNSKEY RRset, instead of leaving it to named. This is helpful when you are transferring a zone from one operator to another: you need to include the other operator's zone signing key in your DNSKEY RRset to ensure that validation works across the transfer.

There is now support for bump-in-the-wire signing, where nsdiff transfers the new version of the zone from a back-end hidden master server and pushes the updates to a signing server which feeds the public authoritative servers.

Get nsdiff from http://www-uxsup.csx.cam.ac.uk/~fanf2/hermes/conf/bind/bin/nsdiff

(Edit: I decided to simplify the -u option so updated from version 1.46 to 1.47.)

(Previously, previously, previously, previously, previously.)

There are essentially two ways to find out what the time is: ask an authoritative source and trust the answer, or ask several more or less unreliable sources and see what they agree on. NTP is based on the latter principle, but since the protocol isn't secured, a client also has to trust the network not to turn NTP responses into lies.

NTP's lack of security causes a bootstrapping problem. Many security protocols rely on accurate time to avoid replay attacks. So nearly the first thing a networked device needs to do on startup is get the time, so that it can then properly verify what it gets from the network - DNSSEC signatures, TLS certificates, software updates, etc. This is particularly challenging for cost-constrained devices that do not have a battery-backed real time clock and so start up in 1970.

When I say NTP isn't secured, I mean that the protocol has security features but they have not been deployed. I have tried to understand NTP security, but I have not found a description of how to configure it for the bootstrap case. What I want is for a minimally configured client to be able to communicate with some time servers and get responses with reasonable authenticity and integrity. Extra bonus points for a clear description of which of NTP's half dozen identity verification schemes is useful for what, and which ones are incompatible with NATs and rely on the client knowing its external IP address.

In the absence of usable security from NTP, Jacob Appelbaum of the Tor project has written a program called tlsdate. In TLS, the ClientHello and ServerHello messages include a random nonce which includes a Unix time_t value as a prefix. So you can use any TLS server as a secure replacement for the old port 37 time service.

Unlike NTP, tlsdate gets time from a single trusted source. It would be much better if it were able to consult multiple servers for their opinions of the time: it would be more robust if a server is down or has the wrong time, and it would be more secure in case a server is compromised. There is also the possibility of using multiple samples spread over a second or two to obtain a more accurate time than the one second resolution of TLS's gmt_unix_time field.

The essential idea is to find a quorum of servers that agree on the time. An adversary or a technical failure would have to break at least that many servers for you to get the wrong time.

In statistical terms, you take a number of samples and find the mode, the most common time value, and keep taking samples until the frequency at the mode is greater than the quorum.

But even though time values are discrete, the high school approach to finding the mode isn't going to work because in many casees we won't be able to take all the necessary samples close enough together in time. So it is better to measure the time offset between a server and the client at each sample, and treat these as a continuous distribution.

The key technique is kernel density estimation. The mode is the point of peak density in the distribution estimated from the samples. The kernel is a function that is used to spread out each sample; the estimated distribution comes from summing the spread-out samples.

NTP's clock select algorithm is basically kernel density estimation with a uniform kernel.

NTP's other algorithms are based on lengthy observations of the network conditions between the client and its servers, whereas we are more concerned with getting a quick result from many servers. So perhaps we can use a simpler, more well-known algorithm to find the mode. It looks like the mean shift algorithm is a good candidate.

For the mean shift algorithm to work well, I think it makes sense to use a smooth kernel such as the Gaussian. (I like exponentials.) The bandwidth of the kernel should probably be one second (the precision of the timestamp) plus the round trip time.

Now it's time to break out the editor and write some code... I think I'll call it "temporum" because that rhymes with "quorum" and it means "of times" (plural). Get temporum from my git server.

We moved to new offices a month ago and I have settled in to the new route to work. Nico's nursery was a bit out of the way when I was working in the city centre, but now it is about a mile in the wrong direction.

But this is not so bad, since I have decided on a -Ofun route [1] which is almost entirely on cycle paths and very quiet roads, and the main on-road section has good cycle lanes (at least by UK standards). There is lots of park land, a bit of open country, and it goes through the world heritage site in the city centre :-) And it's fairly easy to stop off on the way if I need to get supplies.

[1] optimize for fun!

My route to work on gmap-pedometer

I don't have to wrangle children on the way home, so I take the direct route past the Institute of Astronomy and Greenwich House (where Rachel previously worked and where the Royal Greenwich Observatory was wound down).

My route home on gmap-pedometer

So far it has been pleasantly free of the adrenaline spikes I get from seeing murderous and/or suicidally stupid behaviour. Much better than going along Chesterton Lane and Madingley Road!

We often need to patch the software that we run in order to fix bugs quickly rather than wait for an official release, or to add functionality that we need. In many cases we have to maintain a locally-developed patch for a significant length of time, across multiple upstream releases, either because it is not yet ready for incorporation into a stable upstream version, or because it is too specific to our setup so will not be suitable for passing upstream without significant extra work.

I have been experimenting with a git workflow in which I have a feature branch per patch. (Usually there is only one patch for each change we make.) To move them on to a new feature release, I tag the feature branch heads (to preserve history), rebase them onto the new release version, and octopus merge them to create a new deployment version. This is rather unsatisfactory, because there is a lot of tedious per-branch work, and I would prefer to have branches recording the development of our patches rather than a series of tags.

Here is a git workflow suggested by Ian Jackson which I am trying out instead. I don't yet have much experience with it; I am writing it down now as a form of documentation.

There are three branches:

upstream, which is where public releases live
working, which is where development happens
deployment, which is what we run

Which branch corresponds to upstream may change over time, for instance when we move from one stable version to the next one.

The working branch exists on the developer's workstation and is not normally published. There might be multiple working branches for work-in-progress. They get rebased a lot.

Starting from an upstream version, a working branch will have a number of mature patches. The developer works on top of these in commit-early-commit-often mode, without worrying about order of changes or cleanliness. Every so often we use git rebase --interactive to tidy up the patch set. Often we'll use the "squash" command to combine new commits with the mature patches that they amend. Sometimes it will be rebased onto a new upstream version.

When the working branch is ready, we use the commands below to update the deployment branch. The aim is to make it look like updates from the working branch are repeatedly merged into the deployment branch. This is so that we can push updated versions of the patch set to a server without having to use --force, and pulling updates into a checked out version is just a fast-forward. However this isn't a normal merge since the tree at the head of deployment always matches the most recent good version of working. (This is similar to what stg publish does.) Diagramatically,

     |
    1.1
     | \
     |  `A---B-- 1.1-patched
     |    \       |
     |     \      |
     |      `C-- 1.1-revised
     |            |
    2.0           |
     | \          |
     |  `-C--D-- 2.0-patched
     |            |
    3.1           |
     | \          |
     |  `-C--E-- 3.1-patched
     |            |
  upstream        |
              deployment

The horizontal-ish lines are different rebased versions of the patch set. Letters represent patches and numbers represent version tags. The tags on the deployment branch are for the install scripts so I probably won't need one on every update.

Ideally we would be able to do this with the following commands:

    $ git checkout deployment
    $ git merge -s theirs working

However there is an "ours" merge strategy but not a "theirs" merge strategy. Johannes Sixt described how to simulate git merge -s theirs in a post to the git mailing list in 2010. So the commands are:

    $ git checkout deployment
    $ git merge --no-commit -s ours working
    $ git read-tree -m -u working
    $ git commit -m "Update to $(git describe working)"

Mark Wooding suggested the following more plumbing-based version, which unlike the above does not involve switching to the deployment branch.

    $ d="$(git rev-parse deployment)"
    $ w="$(git rev-parse working)"
    $ m="Update deployment to $(git describe working)"
    $ c="$(echo "$m" | git commit-tree -p $d -p $w working^{tree})
    $ git update-ref -m "$m" deployment $c $d
    $ unset c d w

Now to go and turn this into a script...

Bacon and brassicas are best friends. Here's a very simple recipe which is popular in the Finch household.

Ingredients

A Savoy cabbage
Two large or three small onions
A few cloves of garlic
A 200g pack of bacon
Oil or butter for frying
Soured cream

Prep

Chop the onion

Slice the cabbage to make strips about 1cm wide

Cut up the bacon to make lardons

Cook

Get a large pan that is big enough to hold all the cabbage. Heat the fat, press in the garlic, then bung in the onion and bacon. Fry over a moderate heat until the bacon is cooked.

Add the cabbage. Stir to mix everything together and keep stirring so it cooks evenly. As the cabbage cooks down and becomes more manageable you can put the heat right up to brown it slightly. Keep stir-frying until the thick ribby parts of the cabbage are soft as you like, usually several minutes. (I haven't timed it since I taste to decide when it is done...)

I serve this as a main dish with just a few dollops of sour cream on top and plenty of black pepper.

I've spent most of this afternoon staring at tcpdump traces trying to work out what is causing connections from an Exchange server to my Exim servers to be lost.

The original symptom was that our sending rate measurements for the server were going crazy - increasing massively seemingly without a corresponding increase in traffic. This happens because (in our configuration) if a connection is going OK (valid sender and recipient addresses, etc.) Exim doesn't log anything until a message is accepted. Our rate measurement calculations are performed when processing RCPT commands, before the connections were lost, and also before they got logged. (This is, obviously, less than ideal.) So the Exchange server's sending rate seemed to go up for no reason.

I temporarily increased our logging which verified that this was the correct explanation. But why were the connections being lost? I ran tcpdump for a bit and looked at the contents of the connections from the problem host. This showed that a couple of messages with lots of recipients were being repeatedly retried, only to fail because the connection was lost mid-transaction. Most messages were getting through OK.

I suspected a broken firewall, so I sent a message to the relevant staff giving them some details of the problem and asking if they have a firewall that might be causing it. I turned off tcpdump to stop it filling the disk :-) and kept half an eye on the logs. Soon a new and useful bit of information turned up: (I've edited this log extract a bit.)

2009-09-01 18:00:20 +0100
    SMTP protocol synchronization error
    (next input sent too soon: pipelining was advertised):
    rejected "XXXX TO:<abc123@cam.ac.uk>"
    H=mailserver.dept.cam.ac.uk [131.111.999.999]:50555 I=[131.111.8.131]:25
    next input="RCPT TO:<abc123@cam.ac.uk>\r\nRCPT TO:<abc123@cam.ac.uk>\r\n"

Exim is complaining about the XXXX command because only certain specific commands may be pipelined and XXXX isn't one of them. So why is the Exchange server sending gibberish XXXX (sic!) instead of RCPT? In fact it isn't. I took another tcpdump and found that the problem command fell across a packet boundary: one packet ended with @cam.ac.uk>\r\nXX and the next one started with XX TO:<.

This is an absolutely classic signature of the utterly braindamaged Cisco PIX firewall's SMTP fuxup mode. Oh it aggravates me so much. It replaces SMTP commands that it doesn't know with XXXXes, and it's too stupid to handle packet boundaries correctly.

I sent another message to the department's IT staff telling them to fix their firewall ASAP. I shall probably send similar instructions to another department with a misconfigured PIX, since they brushed off my previous request to fix it. Both departments are also having problems with email from us to them, about which our servers can log more details, so I hope I have enough evidence to persuade them to heed my advice.

(a previous story of packet boundary screwups)

I'm annoyed by how slow our office printer is at printing on both sides of the paper. It occurs to me that it ought to be possible to make it much faster (a similar speed to single-sided printing) without too much extra complexity.

At the moment it handles one sheet at a time, in the sequence feed / print / reverse / print / eject. The feed and eject phases can be overlapped but there's otherwise little opportunity for pipelining.

If the duplex mechanism includes a paper path that goes from the print mechanism's exit to its entrance while turning over the sheet, then the path through the printer can be one-way (except inside the duplexer) and the duplexer can operate at the same time as the printer. The whole thing can then handle two sheets at once. The sequence goes like this:

feed sheet 1
print page 2 on sheet 1
sheet 1 into duplexer / feed sheet 2
sheet 1 through duplexer / print page 4 on sheet 2
sheet 1 into printer / sheet 2 into duplexer
print page 1 on sheet 1 / sheet 2 through duplexer
sheet 1 exits / sheet 2 into printer
print page 3 on sheet 2
sheet 2 exits / feed sheet 3
...

I have assumed a C-shaped or S-shaped paper path, where the printer prints on the top of the sheet which turns over before dropping into the hopper, so that you end up with the stack of paper face down in the correct order. Our printer's duplex mode presumably prints pages in even/odd order (i.e. page 2 then page 1 on sheet 1, page 4 then page 3 on sheet 2, etc.) so the output stack still ends up in the right order. My duplexer's 2/4/1/3 page ordering puts greater demands on the RIPper's memory but that shouldn't be a big deal these days. (On the other hand, why does our printer appear to wait while RIPping? Shouldn't that be instant nowadays?)

So I wonder if anyone makes a printer with a duplexer that works this way. It should be fairly obvious from the duplex / duplex / exit / exit rhythm.

The University of Cambridge Computing Service is currently looking for a second VOIP sysadmin. We've just finished replacing our old analogue phone switch with a Cisco-based setup serving over 16,000 lines. There are more details (including salary level and waffle) in the job ad.

I hate plastic carrier bags, so I take a canvas bag when I go shopping (unless I forget). The city-centre Sainsbury's in Cambridge recently replaced their express tills with self-checkout tills, which use weighing scales under the bags to check that scanned things have been bagged properly.

If you put an unexpected object in the bagging area it flashes up an annoying message. This screen has a button on it saying "Using your own bag?" which I am pretty sure used to cancel the message and allow you to scan your shopping. It is now non-functional: if you try pressing it a member of staff has to come over and explain the correct procedure. If you want to use your own bag, before you put it on the bagging area you have to go into the "select an item" menu (for bananas and fresh bread etc.) and choose the "bag re-use" icon. Obviously.

Thanks to everyone for the informative comments on my previous post.

I found a bug in my simulation code which meant that many of the tests were performed on arrays that were half-full of zeroes (oops). This distorted the stats a little, and more for larger N because I tested fewer different arrays for large N. This bug caused the increase in s.d. for large N: the corrected code has the same s.d. for all N. Another effect was to increase the maximum number of iterations required to find an entry. After the fix the maximum number of iterations goes down to log(log(N))+12 for the pure secant method, and (log(N)+14)/2 for Junio's mixed secant/bisection method. Altogether much more well-behaved.

The usual way to find an element in a sorted array is using a binary search, which takes log(n) time, where logs are understood to be in base 2 and N is the size of the array.

Linus Torvalds made the clever observation that you can do better than log(N) if you know that the contents of the array are uniformly distributed. For instance, git's pack files store multiple objects identified by the SHA-1 hashes of their contents. Each pack file has an index containing a list of the pack's SHA-1 IDs and their objects' locations in the pack. The index is sorted by SHA-1 ID. Since SHA-1 is a cryptographic hash, we can assume its output is uniformly distributed.

Linus described his technique as "Newton-Raphson" which is a bit of a misnomer, since N-R works on smooth differentiable curves whereas what we have is a straight line with some stochastic variations. What we're actually doing is an iterated linear interpolation. If the SHA-1 IDs were perfectly evenly distributed then a single linear interpolation would land us right on the target item, but the random variation means we will be off by some amount, so we need to continue searching.

How far off will we be? It turns out (based on Monte Carlo simulation) that the expected error is about 0.31 * sqrt(N) with a standard deviation of about 0.26 * sqrt(N). This is a really promising result since it implies that each iteration reduces the search space to N^1/2 whereas an iteration of binary search reduces it to N/2. So we should expect a complete search to take O(log(log(N))) iterations.

I wrote a simulation to try this out, and it matches this prediction: in fact the number of iterations was about 1 + log(log(N)). However what is the variation around this expected result? In my tests it turned out that the maximum number of probes was log(N) though for small N it bottomed out at about 16. When testing lots of different randomly filled arrays, the standard deviation was about 1.2 for all values of N, but when I tested fewer arrays this number ramped up.

Junio Hamano's implementation of Linus's idea is included in git but disabled by default. He added a tweak that biases the linear interpolation towards the centre of the search range, so it's kind of a balance between binary search and linear interpolation search. In my simulator this tweaked version required (log(N)+3)/2 iterations on average with a standard deviation of 0.8. The maximum number of iterations was again log(N) but it bottomed out at about 12. Overall it's a bit slower but better behaved.

In git, where a large repository might contain two million objects, and where pack index lookups are not particularly performance-critical, this improved lookup code doesn't provide a noticeable advantage. Still, I think it's interesting and the idea might be useful in other situations. Note that unlike a binary search, which can just use comparisons returning greater / equal / less, the linear interpolation search needs to know the absolute values of the elements. Git's code actually uses a lexicographic variant that ignores any common prefix shared by the elements in the search range, and uses only the next two bytes for the interpolation.

To finish, here's a bit of code. In this example, 0.0 <= array[k] < 1.0, and I use k for keys and v for values of array elements. We are searching for vtarg.

	/* all bounds are exclusive */
	double vlo = -DBL_MIN, vhi = +1.0;
	int klo = -1, khi = N;
	while(klo - khi > 1) {
		int kmid = klo + (khi-klo) * (vtarg-vlo) / (vhi-vlo);
		/* ensure rounding does not put us out of bounds */
		if(guess_k <= min_k) guess_k = min_k + 1;
		if(guess_k >= max_k) guess_k = max_k - 1;
		double vmid = array[kmid];
		if(vmid == vtarg) return(kmid);
		if(vmid < vtarg) klo = kmid, vlo = vmid;
		if(vmid > vtarg) khi = kmid, vhi = vmid;
	}
	return(-1);

Addendum: there are a few corrections in a follow-up post.

Yesterday I dealt with a fairly routine question from a computer officer about sending email from a web server. It's always nice when someone asks for advice, even when they expect the answer won't contain any surprises, just in case it does. I gave our usual advice about the safest ways to set up web forms that send email. The bit that tempted fate was saying that web forms have caused spam problems for us "in the past" - and I had been reflecting recently that we haven't had a big web spam problem for quite a long time.

Our recent problems have all been account compromises of one kind or another. One that will be familiar to other postmasters is caused by users sending their login credentials in reply to a phishing message. We seem to have a relatively clued-up userbase, based on tales from other universities; very few reply to most phishing spams and most of those replies do not contain passwords - they are more or less skeptical and sometimes taunt the spammer. However there's still the occasional idiot, and even a particularly special one who had to be re-educated twice! If any passwords do escape and accounts get used to spam, our strict rate limits for remote email users shut the unwanted flows down pretty quick.

More recently we have had problems with compromised accounts on email systems other than our own. These seem to have been due to infected PCs on networks with a Microsoft Windows domain and Exchange server - I don't think remote access (e.g. via Outlook Web Access) was to blame in these cases. We have been reasonably successful applying rate limits to mail servers that relay outgoing mail through us. We set the limit for each server to be about 10% above their normal peak usage, so it should not trigger unless something suspicious is happening, and we have a monitoring script to warn us if they stray into (or past!) the 10% headroom. The first time this mechanism caught a spam run it stopped an 80,000 message flood about 2% of the way through.

So Fate observed my complacency about web form compromises and handed us a big plate full of crap today. Shortly before I got into work, one of our departments started spewing email and fairly swiftly whacked into their rate limits. I was foolishly trusting of their competence and aware of their occasional need to send large mailshots, so I tried to let the mail through - but it rapidly became clear that tens of thousands of messages was way beyond normal and their arrival rate of hundreds per second was going to cause us problems even if it was legit. Silly me for not looking at a sample message. I got in touch with the department's techies and it rapidly became apparent there was a compromise, and we went into lock-down and clean-up mode. Between us we had to delete about 100,000 queued messages, with the added joy that we had to avoid deleting the couple of dozen legitimate messages that had the same sender address as the spam. Sadly my error meant that 20,000 messages escaped (plus 6000 with invalid recipient addresses) whereas it should have been only two or three thousand :-(

As I said yesterday, there are two ways to ensure that a web form sends email safely. The first is to fix the recipient address, for example in "send feedback to the webmaster" forms, or mail to registered users of your site. The second is to fix the contents of the message, for example in signup or purchase confirmation messages. Either of these is enough to make the form useless to spammers.

Today's excitement was caused by a "mail a copy of this story" form, which should have been implemented safely in the second style. However the form allowed remote users to add their own comments above the story, and gave them plenty of space to do so - enough for a typically lengthy 419 scam.

It's fair to say this has been a learning experience, more painful for some than for others :-) I should remember that technical cockups are one thing, but even if you are technically competent political requirements can still screw the pooch.

I've recently written a three page memo about the advantages of our username scheme compared to "friendly name" email addresses in large domains. I have put a copy of the PDF on the web which you can read if you like.

Vesta includes a purely functional programming language for specifying build rules. It has an interesting execution model which avoids unnecessary rebuilds. Unlike make, it automatically works out dependencies in a way that is independent of your programming language or tools - no manually maintained dependencies or parsing source for #include etc. Also unlike make, it doesn't use timestamps to decide if dependencies are still valid, but instead uses a hash of their contents; it can do this efficiently because of its underlying version control repository. Vesta assumes that build tools are essentially purely functional, i.e. that their output files depend only on their input files, and that any differences (e.g. embedded timestamps) don't affect the functioning of the output.

I've been wondering if Vesta's various parts can be unpicked. It occurred to me this morning that its build-once functionality could make a quite nice stand-alone tool. So here's an outline of a program called bonce that I don't have time to write.

bonce is an adverbial command, i.e. you use it like bonce gcc -c foo.c. It checks if the command has already been run, and if so it gets the results from its build results cache. It uses Vesta's dependency cache logic to decide if a command has been run. In the terminology of the paper, the primary key for the cache is a hash of the command line, and the secondary keys are all the command's dependencies as recorded in the cache. If there is a cache miss, the command is run in dependency-recording mode. (Vesta does this using its magic NFS server, which is the main interface to its repository.) This can be done using an LD_PRELOAD hack that intercepts system calls, e.g. open(O_RDONLY) is a dependency and open(O_WRONLY) is probably an output file, and exec() is modified to invoke bonce recursively. When the command completes, its dependencies and outputs are recorded in the cache.

bonce is likely to need some heuristic cleverness. For example, Vesta has some logic that simplifies the dependencies of higher-level build functions so that the dependency checking work for a top-level build invocation scales less than linearly with the size of the project. It could also be useful to look into git repositories to get SHA-1 hashes and avoid computing them.

It should then be reasonable to write very naive build scripts or makefiles, with simplified over-broad dependencies that would normally cause excessive rebuilds - e.g. every object file in a module depends on every source file - which bonce can reduce to the exact dependencies and thereby eliminate redundant work. No need for a special build language and no need to rewrite build scripts.

Here's an abbreviation to avoid, because its various meanings overlap so heavily.

Source Code Manager: synonym for version control system, e.g. as used by the Linux crowd in the early days of git.
Software Configuration Manager: usually incorporates version control, build system, archival of build products, and CASE methodology baggage, e.g. ClearCASE and Vesta.
System Configuration Manager: for system administrators rather than developers, e.g. cfengine and Puppet.

How long will it be before it becomes normal to archive everything? It's already normal in some situations, and I think that's increasing. It's been the norm in software development for a long time. There's an increase in append-mostly storage systems (i.e. append-only with garbage collection) which become never-delete systems if you replace the GC with an archiver. Maybe the last hold-outs for proper deletion will be high data volume servers...

Anyway, I feel like listing some interesting append-only and append-mostly systems. A tangent that I'm not going to follow is the rise of functional programming and immutability outside the field of storage. Many of these systems rely on cryptographic hashes to identify stuff they have already stored and avoid storing it again, making append-only much more practical.

All version control systems, and software configuration management systems even more so. The former archive source code whereas the latter archive build tools and build products as well. DEC's Vesta SCM is particularly interesting, being based on a purely functional build language designed to maximize memoization - i.e. minimize unnecessary rebuilds. It's sort of ccache on steroids since it caches the results of entire module builds, not just individual source file compiles.
Nix is a purely functional package manager. Unlike most packaging systems like dpkg or rpm, Nix packages do not conflict with each other: you upgrade by installing new packages alongside your existing ones, then you stop running the old ones and start running the new ones.
Archival / backup systems, like Venti which is Plan 9's append-only filesystem. Apple's Time Machine isn't nearly as clever.
Most filesystems don't use hash-based uniquification. Append-mostly filesystems often provide cool undelete features like snapshots, e.g. NetApp's WAFL or Sun's ZFS. Early filesystems of this kind, e.g. BSD LFS tried to avoid wasting space, so didn't make old data available as snapshots, and sacrificed performance to eager garbage collection. More recently, DragonFly BSD's Hammer filesystem doesn't even have an in-kernel garbage collector, and running it is entirely optional.
Email archives: gmail's ever-increasing quotas, cyrus delayed expunge.

I was originally planning to witter about distributed version control vs. centralized version control, especially the oft-neglected problem of breaking up a large cvs / svn / p4 repository. This was partly triggered by Linus's talk about git at Google in which he didn't really address a couple of questions about how to migrate a corporate source repository to distributed version control. But in the end I don't think I have any point other than the fairly well-known one that distributed version control systems work best when your systems are split into reasonably modestly-sized and self-contained modules, one per repository. Most systems are modular, even if all the modules are in one huge central repository, but the build and system integration parts can often get tightly coupled to the repository layout making it much harder to decentralize.

Instead I'm going to wave my hands a bit about the ways in which git has unusual approaches to distributed version control, and how bzr in particular seems to take diametrically opposing attitudes. I'm not saying one is objectively better than the other, because most of these issues are fairly philosophical and for practical purposes they are dominated by things like quality of implementation and documentation and support.

Bottom-up

Git's design is very bottom-up. Linus started by designing a repository structure that he thought would support his goals of performance, semantics, and features, and worked upwards from there. The upper levels, especially the user interface, were thought to be of secondary importance and something that could be worked on and improved further down the line. As a result it has a reputation for being very unfriendly to use, but that problem is pretty much gone now.

Other VCSs take a similar approach, for example hg is based on its revlog data structure, and darcs has its patch algebra. However bzr seems to be designed from the top down, starting with a user interface and a set of supported workflows, and viewing its repository format and performance characteristics as of secondary importance and something that can be improved further down the line. As a result it has a reputation for being very slow.

Amortization

Most VCSs have a fairly intricate repository format, and every operation that writes to the repository eagerly keeps it in the canonical efficient form. Git is unusual because its write operations add data to the repository in an unpacked form which makes writing cheaper but makes reading from the repository gradually less and less efficient - until you repack the repo in a separate heavy-weight operation to make reads faster again. (Git will do this automatically for you every so often.) The advantage of this is that the packed repository format isn't constrained by any need for incremental updates, so it can optimise for read performance at the expense of greater pack write complexity because this won't slow down common write operations. Bzr being the opposite of git seems to do a lot more up-front work when writing to its repository than other VCSs, e.g. to make annotation faster.

Thus git has two parallel repository formats, loose and packed. Other VCSs may have multiple repository formats, but only one at a time, and new formats are introduced to satisfy feature or performance requirements. Repository format changes are a pain and happily git's stabilized very early on - unlike bzr's.

Laziness

As well as being slack about how it writes to its repository, git is also slack about what it writes. There has been an inclination in recent VCSs towards richer kinds of changeset, with support for file copies and renames or even things like token renames in darcs. The bzr developers think this is vital. Git, on the other hand, doesn't bother storing that kind of information at all, and instead lazily calculates it when necessary. There are some good reasons for this, in particular that developers will often not bother to be explicit about rich change information, or the information might be lost when transmitting a patch, or the change might have come from a different VCS that doesn't encode the information. This implies that even VCSs that can represent renames still need to be able to infer them in some situations.

Git's data structure helps to make this efficient: it identifies files and directories by a hash of their contents, so if the hash is the same it doesn't need to look any closer to find differences because there aren't any - and this implies a copy or rename. This means that you should not rename or copy a file and modify it in the same commit, because that makes git's rename inference harder. Similarly if you rename a directory, don't modify any of its contents (including renames and permissions changes) in the same commit.

Mercurial also uses hashes to identify things, but they aren't pure content hashes: they include historical information, so they can't be used to identify files with the same contents but different histories. Thus efficiency forces hg to represent copies explicitly.

Any more?

I should say that I know very little about bzr, and nothing about tla, mtn, or bk, so if any of the above is off the mark or over-states git's weirdness, then please correct me in a comment!

LISTSERV uses a null return path (RFC821 MAIL FROM:<>) on its administrative mail, and some mail hosts reject this. I discard message with a null return path that do not match a few simple heuristics, so I lose things like subscription confirmations from services like JISCfail. This makes me cross.

L-Soft claim that LISTSERV is following the specifications, and they cite a couple of paragraphs from RFC 821 (published in 1982) and RFC 1123 (1989). However they fail to cite text from RFC 2821 (2001) which explicitly forbids what they are doing.

There are several types of notification messages which are required by existing and proposed standards to be sent with a null reverse path, namely non-delivery notifications, other kinds of Delivery Status Notifications (DSNs), and also Message Disposition Notifications (MDNs). [...]

All other types of messages (i.e., any message which is not required by a standards-track RFC to have a null reverse-path) SHOULD be sent with with a valid, non-null reverse-path.

The only other permitted use of null return paths that I know of is vacation notifications, described in RFC 3834 (published in 2004).

L-Soft needs to get a grip, read some RFCS, and fix their software.

We've been discussing configuration management in the office this week. None of us are happy with the way we're doing things...

On Hermes, we do most configuration management using rdist. On our management server we have a number of file trees containing files that differ from the default - a smattering of stuff in /etc, the contents of /home and /opt, and a few other bits scattered around. These trees are cloned and hacked for each flavour of machine (ppswitch, cyrus, lists, webmail, etc.) and each version of the underlying OS. These trees are mostly kept under revision control.

This setup has the advantage of simplicity, and it's easy to push out small changes. One key feature is rdist's testing mode, which makes it easy for us to ensure that the servers are in sync with the configuration tree without changing anything. We often run a test across a cluster of 10 or 16 machines in parallel. It's easy to selectively push a change to an idle machine for testing before rolling it out to the rest of the cluster. For more tricky changes I do a phased rollout so I can check for unwanted changes in behaviour without breaking the entire service at once.

Of course it has some serious disadvantages. We have to be root on the management host to be able to push changes out to the servers. We can't easily keep file ownership and permissions under revision control. This mechanism misses significant parts of the configuration, such as installed base OS packages and which rc scripts are enabled. There's also a lot of scope for improving our initial OS install scripts to reduce the amount that they get out of sync with the rdist configuration.

So we'd like something better. Our colleagues have different system management setups with different problems, and they are also looking for something better.

A couple of my colleagues have looked at Puppet but weren't happy with it. I dislike its basic design. Managed servers pull their configurations from the master, which means you must never screw up a change on the master, and it's harder to test changes - you have to explicitly set up test server profiles. Yes, you can run Puppet in push mode but it's often a bad idea to work against a program's basic architecture. Puppet also has its own security mechanisms, whereas I'd prefer to avoid multiplying channels of trust. Finally, I really hate writing configuration files for programs that write configuration files. It's a waste of brain cells to understand this superfluous abstraction layer: I just want to write the underlying configuration file directly.

Aside: this is why I really really hate autoyast, which uses an XML configuration file to control a program that writes configuration files that control rc scripts that manipulate the underlying configuration files for the programs you actually care about. It takes hours to work out how to make it produce the correct results.

I spent some time working on a program called pstx which was to be a suped-up replacement for rdist. It was going to be an sftp client (so it would require nothing special on the target servers) that had a simple configuration file to specify which directory trees to copy where, including ownerships and permissions (bringing them under revision control and removing the need to be root on the master), and possibly with added bells and whistles like remote diff, and a reverse (pull) mode. I also intended to make it easer to combine collections of files and thereby share common files. Sadly it's only about a third written and likely won't get much further.

One of the conversations this week was about how to reduce the chance of mistakes when rolling out changes, in which we discussed the use of revision control to help with testing and rollback. At the moment Hermes mostly uses CVS and other unixy bits of the CS share a Subversion repository, so the conversation very much assumed a central repository model - which fits in well with a master configuration server. Recently I have also been investigating git more seriously, so I thought it might be part of a solution.

The key things that git provides include a flexible and efficient network protocol for moving files around (flexible in that it can use ssh and http as well as git's native protocol, and efficient in that it's incremental and compressed), it can tell us what differs between a directory tree and the state of the repository, changes can be pushed and pulled, etc. The distributed version control functionality is way better at all this than pstx was ever going to be.

The big missing part is the ability to track ownerships and permissions: git only supports normal development checkouts which should be done using the developer's uid and umask. There are also some low-level problems with the way git performs checkouts: it removes the destination file before writing the updated version in its place. I would like a special deployment command which checks all the changed files out under temporary names, fixes their ownerships and permissions, then renames them into place. The first stage almost exists in the form of git-checkout-index --temp d, though it does weird things with symlinks. It doesn't look like much work to add the other stages.

This could provide some really nice workflows. A configuration change is created on a small topic branch, then pushed to the configuration repositories on all the machines - which changes none of the live configuration. You can then switch a test machine to the new branch with a single git deploy branch, or roll back by re-deploying the master branch. You can find out the current configuration of a machine in summary using git status or git diff to make sure the summary isn't a lie.

One interesting possibility might be to tame the clone-and-hack problem by using a shared repository for all the different classes of machines. This should make it easier to see the differences between each kind of machine and spot spurious ones.

One thing that is very manual in our current setup is hupping daemons to make them load an updated configuration. It might be feasible to use a post-deploy hook to do this, provided the hook script is given enough information that it can avoid unnecessary restarts.

Now I just need to find some time to turn the idea into code...

This is a bit late, but I've been inspired to post after reading about the women other people admire. Here are a few of mine.

Margo Seltzer: developer of the BSD log-structured filesystem and Berkeley DB. An open source entrepreneur - founder and CTO of Sleepycat Software - who also has a successful academic career.
Dina Katabi invented XCP, the eXplicit Congestion control Protocol, which is a brilliant way for routers to make each individual flow adjust to the level of congestion without the need for any per-flow state. I think it was one of the first papers I read about advanced transport protocol research, and I have continued to follow the subject since then.
Lisa Dusseault is a WebDAV expert and IETFer, currently one of the Applications Area Directors and therefore a member of the IESG. She was one of the funner people I met at the Paris IETF meeting a few years ago. I like the new ideas she's brought to the IESG, such as monthly updates on her work as Apps AD.
And of course rmc28, who got the University Card working properly and now slaves away on the student information system.

I posted a couple of articles (one, two) about the Corpus clock soon after it was unveiled, and I optained a copy of the Chronophage book soon after it became available. The book is a bit lacking in technical detail, so I was pleased to find out that John Taylor would be talking about it for the Cambridge Science Festival and I attended his talk yesterday. The lecture theatre was packed.

The talk was in three parts: a bit of autobiography, then a quick overview of the clock and its development, then questions. The talk revealed some interesting facts which do not appear in the book.

Taylor's story of how he came to make a fortune in kettle thermostats is relevant to the clock not just because that's how he made the money: his horological hero, John Harrison, invented the grasshopper escapement that inspired the Chronophage, and also invented the bimetal on which Taylor's thermostats rely. He explained his snap-action bimetallic disc, which has a C cut out to form a tongue in the middle; this tongue moves further than the centre of a disc without the cut-out does, which loosens the tolerances required to manufacture a thermostat. The calligraphic flourish engraved below his name on the clock's pendulum bob forms the shape of his bimetallic disc. (Sorry, I can't find a picture on line though it's visible in the book.)

ETA: I have found a large picture of the pendulum bob in which you can see the shape of Taylor's bimetallic disc under the date. It's overdrawn by some of the flourish so it's a bit obscure unless you know what you are looking for.

There were a couple of hints at the clock's workings. It is usual for clocks with chiming mechanisms to have a second set of drive springs or weights for the chimes. The Corpus clock has only one spring, so its hourly death-rattle is driven in an unusual way. This is why the clock ticks back and forth while rattling the chain in the coffin. There are two main mechanisms inside the Chronophage itself. It contains a mechanical pseudo-random mechanism to control the blinking and biting. The sting, however, is regular: it slowly rises then jabs down on each quarter hour.

He also mentioned that the clock "has fun" four times a year: on 24th March, which is the date of Harrison's birth and also his death; on 25th November, Taylor's birthday; on new year's eve and new year's day; and on Corpus Christi day, the Church's festival that occurs on the Thurdsday after Trinity Sunday, the 8th Sunday after Easter. The clock is away for maintenance this week, so I hope it'll be back in time to muck around on Tuesday week. The fun seems to involve even more erratic ticking than usual, with more colourful lights.

The clock's spring has to be particularly strong in order to overcome the momentum of the large escape wheel that goes around the outside of the clock. When prototyping the clock this caused a serious problem: the force of the spring is transmitted through the escapement to the pendulum, making it swing higher and higher and eventually breaking the clock. Taylor overcame this problem by adding a regulator, which also serves two other functions: it produces the clock's erratic behaviour that plays tricks with observers, and it listens to the MSF time signal to synchronize the clock every five minutes. (Taylor confirmed to me after the talk that these are all functions of the same mechanism. He also said that the hollow pendulum bob is not in fact "massive and weighty" as the book says, which answered my question about conservation of momentum.) I'm not sure how this is consistent with his assertions that the clock is purely mechanical - would it work if the computer regulator broke? News reports about the clock's teething problems suggest that it's pretty vital. I wonder if the maintenance periods this week and back in January are for software patches rather than mechanical adjustment?

I asked Taylor if he would publish more technical details about the clock, but he thinks that the mystery makes it fun. I disagree :-/

My colleagues over in MISD (the University's administrative computing unit, as opposed to the CS which is the University's ISP and IT training department) are looking for another DBA to help look after the Oracle running under exciting services like the financial system and student information system.

http://www.admin.cam.ac.uk/offices/hr/jobs/vacancies.cgi?job=4781

Since April 2002 I have been keeping a link log: a web page where I record links to other pages that are notable in some way, with a brief comment about each one. It's gatewayed to Delicious, LiveJournal, and Twitter. The Twitter feed is new.

I've been on Twitter since March 2007, when it made a big splash at the SXSW festival. I joined to see what the fuss was about but soon stopped using it. None of my social circle were using it much, and there wasn't a sensible way of finding people in the more distant reaches of my social network. For 18 months my last tweet said " Grumbling about the difficulty of finding out people's twitter usernames".

Recently Twitter has taken off in the UK, since a fairly large contingent of journalists and celebrities have started using it and talking about it. (Reminds me in a small way of the rise of the web in 1993/4.) Coincidentally at about the same time I decided to get back into using Facebook and Twitter a bit more, to keep in better touch with my family's activities. (They're on Facebook but I linked my Twitter and Facebook statuses together for convenience.)

The celebrity thing helped me to get more out of Twitter in a significant way. I had thought of it as more like Facebook or LiveJournal: a way of keeping up with the activities of people I already know. But in fact it's more like the wider blogging culture where it's normal to follow someone's writing just because they are interesting. (Yes, LJ blurs into this kind of blogging too.) In fact the even the Twitter guys didn't realise this at first: until June 2007 Twitter sent me email saying "You are foo's newest friend!" but the following month it started sending me email saying "foo is now following you on Twitter!". I now feel free to follow well-known people just because they are interesting, and I found more interesting people by looking through the lists of people they follow.

Part of the Twitter culture is heavy use of short URL services when posting interesting links. I decided that my link log ought to be gatewayed to Twitter to join in the fun, so I embarked on a bit of hacking. Previously it had been just a static HTML page, with a command-line script that I used to insert a new link. There was another script to turn the HTML into an RSS feed for LJ, and some code for uploading the results to chiark. I had a rudimentary click tracker so that I could get some idea of what links other people followed, but it did not shorten URLs. (That meant it was an open redirector until I found out what evil that made it vulnerable to.)

I replaced all the old static HTML and RSS gubbins with a CGI script which can format my link log in HTML or Atom, as well as providing short URLs for all the links. I decidided to host my own short URLs partly to continue tracking what people find interesting, and partly because I have a vanity domain that's just right for this job :-) The command line script for adding new links remained basically the same, apart from new code to create short tags and to post the link to Twitter.

After a couple of weeks of this new regime I decided I needed to be able to save links found when reading Twitter or LiveJournal on my iPod Touch. Its UI makes it easy to email links but not much else, so I set up a gateway from email to my link log. I send links to a specially tagged email address which is filtered to a special mailbox. A script polls this mailbox over IMAP every few minutes, looking for messages that match a number of paranoid checks. From these extracts a URL and a comment which it passes on to the old command-line script. (I still use the latter when I'm on a computer with a keyboard.)

The end result is pretty neat: I can easily save interesting links that I see in places where that was not previously practical. It's probably a good thing I don't have an iPhone because then I'd be able to spod everywhere.

PS. a brief hate: web sites that redirect deep links to the mobile version of their front page. Utter cWAP. They'll get no Google Juice from me. (In fact the iPhone browser is good enough that mobile sites are usually NOT an improvement.)

I've recently started playing with lpeg, a parsing library for Lua. It is based on "Parsing Expression Grammars", which were recently popularized by the prolific Bryan Ford. PEGs have some nice properties: they're suitable for unified parsers that handle both the low-level lexical syntax as well as higher-level hierarchial syntax; they have much simpler operational semantics than traditional extended regexes or context-free grammars; and as well as familiar regex-like and CFG-like operators they have nice features for controlling lookahead and backtracking. PEGs were originally developed alongside a cute algorithm for linear-time parsing which unfortunately also requires space linear in the input size with a fairly large multiplier. Lpeg instead uses a simple parsing machine, implemented somewhat like a bytecode interpreter. Its performance is quite competitive: the long paper says it has similar performance to pcre and glibc's POSIX regex implementation, and the short paper says it has similar performance to lex and yacc.

Lpeg actually consists of two modules. The core lpeg module is written in C and allows you to compose parser objects using operator overloading, building them up from primitives returned from tersely named constructor functions. The resulting syntax is rather eccentric. On top of that is the re module which provides a more normal PEG syntax for parsers, which despite the name of the module are rather different from regular expressions. This module is written in Lua, using an lpeg parser to parse PEGs and construct lpeg parsers from them. The PEG syntax is extended so that you can define "captures". Captures are the really nice thing about lpeg. You can use them like captures in Perl regexes to just extract substrings of the subject, but you can often do better. Lpeg captures are more like the semantic actions that you can attach to rules in parser generators like yacc. So, where in Perl you would do the match then fiddle around with $1, $2, etc, with lpeg the match can incorporate the fiddling in a nice tidy way. (In fact, probably the closest comparison is with Perl 6 rules, but they're not yet practically usable.)

The program I was writing with lpeg was to process some logs. I needed to convert the timestamps from ISO 8601 format into POSIX time_t which implied converting 8 fields from strings to numbers. Rather than having to convert each capture individually, or loop over the captures, I could write a single grammar rule to match a pair of digits and convert it to a number, then refer to that rule elsewhere in the grammar. (In fact Lua will coerce strings to numbers implicitly in most - but not all - circumstances. I happened to be tripped up trying to compare a number with a string, which doesn't coerce.) In the end it's nicest to let the parser drive all the program's activity through its semantic actions.

( code )

The Student Loan Company executive management board minutes from a meeting just over a year ago says the following in section 6, "update on data security processes":

RSJ provided an update on Data Security and advised that information which was being received from external sources confirmed that the transfer of data on removable media devices was now unacceptable. He stated that there was a need to consult with HEI’s as to the method of transferring Attendance Confirmation Reports as SLC now had PGP encryption software available which could replace the previous method of transferring the data via CD’s. He also stated that the PGP software which SLC were using should be checked to ensure that it was on the US Government list of standard encryption as HEI’s are only permitted to use PGP software from this list.

Not shipping media is good. Using end-to-end encryption is good. (Unlike banks which seem to like SMTP over TLS, which provides no additional security for inter-domain communication.) I wonder why the choice of PGP instead of S/MIME - I believe that PGP usually requires an add-on whereas S/MIME is often built in to MUAs. Perhaps they've been nobbled by a vendor.

In my previous post I said I was using the standard formula for the exponentially weighted moving average. This is not entirely true because the standard formula uses a complementary smoothing constant (α = 1 − a) which allows you to write the formula as an incremental adjustment to the previous value.

It's also possible to express the standard unweighted mean in a similar manner.

I was quite pleased when I found out about this analogy :-)

(Thanks to wikipedia for rendering the maths.)

In 2005 I developed a mathematical model for measuring average event rates which became the core of a new rate limiting feature for Exim. The model has a particularly useful property which I did not expect it to have and which I did not (until recently) fully understand.

The model

The central formula is the standard exponentially weighted moving average:

r_new = (1 − a) ∗ r_inst + a ∗ r_old

We use this to calculate the new average rate from the instantaneous rate and the result of the previous average calculation. These rates are all measured in events per some configurable time period. The smoothing factor a is a value between 0 and 1 which determines how slowly the model forgets about past behaviour.

We calculate r_inst and a from the raw inputs to the formula, which are p, a time period configured by the postmaster, and i, the time interval between the previous event and this event. By dividing them we get

r_inst = p / i

In this formula, p determines the per time unit used for measuring rates, e.g. events per hour or events per day.

The exponentially weighted moving average is usually used to smooth samples that are measured at fixed intervals, in which case the smoothing factor, a, is also fixed. In our situation events occur at varying intervals, so the smoothing factor needs to be varied accordingly.

a = e^{−i / p}

In this formula, p determines the smoothing period, i.e. the length of time it takes to forget 63% of past behaviour.

A useful property

When developing the model, I needed to understand how it reacts to changes in the rate of events. It's fairly simple to see that if the rate drops, the average decays exponentially towards the new rate. It's less clear what happens when the rate increases. A particular practical question is what happens if there's a sudden burst of messages? How much of the burst gets through before the average rate passes some configured maximum?

I did some trial computations and I found that when the interval is very small (i.e. the rate is high) the average rate increases by nearly one for each event (the smaller the interval, the closer to one). This means that the maximum rate is also the maximum burst size. How wonderfully simple! The postmaster can configure the model with two numbers, a maximum number of events per a time period, which also directly specifies the units of measurement, the smoothing period, and the maximum burst size.

(The maximum burst size is larger for slower senders, increasing to infinity for those sending below the maximum rate. However you don't have to be much above the maximum for your average to hit the limit within one period.)

This property produces a simple user interface, but I did not understand how it works. It's obvious that when i is small, a ≈ 1, but it is not so clear why (1 − a) ∗ r_inst ≈ 1. It seems I landed directly on a mathematical sweet spot without the fumbling around that might have led me to understand the situation better.

Arbitrary choices

I made two choices when developing the model which at the time seemed arbitrary, but which both must be right to get the max rate = burst size property.

Firstly, I decided to re-use the configured period for two purposes: as the smoothing period and as the per time unit of rates. I could instead have made them independently configurable, but this seemed to give no benefits that compensated for the extra complexity. Alternatively, I could have used a fixed value instead of p in the formula for r_inst, but that seemed liable to be confusing or awkward, and to require more mental arithmetic to configure.

Secondly, I decided to use e as the base of the exponent. I could have used 2, in which case the smoothing period would have been a half life, or 10, so that 90% of past behaviour would be forgotten after one period. There seemed to be no clear way to choose, so I split the difference and went with e on the basis of mathematical superstition and because exp() has fewer arguments than pow().

How it works

Looking back over my old notes this week, I had a revelation that the max rate = burst size property comes from the fact that the gradient of e^x is 1 when x is 0. To show why this is so, I first need to define a bit of notation:

y ≡ f(x) ≡ e^x
δy ≡ f(x + δx) − f(x)
δx ≡ −i / p

This allows us to say, when x is zero and δx is small,

(1 − a) ∗ r_inst
= (1 − e^−i/p) ∗ p/i
= (1 − e^δx) / −δx
= (e^δx − 1) / δx
= (e^{0 + δx} − e⁰) / δx
= ( f(0 + δx) − f(0) ) / δx
= δy / δx
≈ dy / dx
= e^x
= 1

Sweet.

Three months ago, Apple released their new "unibody" aluminium MacBook laptop, and just over a month before that they released the second-generation iPod touch. I was in the market for a new Mac laptop, and at some point I planned to upgrade my iPod when a 64GB Touch or iPhone became available. However when I found out that Apple were offering £95 "back to school" rebates to those who bought an iPod Touch and a MacBook, I couldn't wait any longer - this was the 30th October and the offer ceased at the end of the month.

Unfortunately I made the order with a rather crusty old version of Firefox which handled Apple's customization JavaScript incorrectly - it discarded all the options I chose, and because I wasn't familiar with the order process I didn't know that the final description screen should have included more details than it did. This wasn't a problem for the bits and pieces that I could also buy at the local ~~greengrocer~~ Apple Store, but one important option I chose was the US keyboard layout. I wanted it partly because I normally use a Happy Hacking keyboard which has a similar layout, and partly because the ISO layout has thin return key that is awkward to press as well as being ugly.

When I discovered this upon unboxing the new Shiny! my heart sank. You can't change the keyboard on the current MacBooks without replacing the entire machine. But I phoned Apple to see what I could do about it, and I was pleasantly surprised to find out that they would replace the machine with the correct model for free, no questions asked.

I discovered Apple's FAIL when the second MacBook arrived. I had been sent the "International English" model, which is identical to the "British" layout apart from a couple of currency symbols. They also sent me a continental power plug instead of a BS 1363 plug. When I phoned Apple they were suitably contrite, even offering me £70 compensation for the error!

Until this point (two weeks after the original order) I was very happy with Apple's support - no dumb scripts, reasonably competent staff. Sadly this didn't last. While the phone staff continued to be helpful and informative about what was going on, I had to keep phoning to get an update because the supervisor who was supposed to be dealing with the problem was too busy doing other stuff. The cause seemed to be that the department which handled replacements didn't believe that US keyboards were available in the UK (despite what the online store said) and the support people were incapable of fixing or working around this failure.

What was worse was that it took another three weeks to come to this conclusion. (I was away in France for one of them, during which I didn't chase Apple so nothing happened.) I couldn't use my new iPod because I didn't want to have the pain of re-pairing and re-populating it with music etc. This all made me extremely cross.

In the end I had to return the second MacBook for a refund and buy a third one with the correct configuration, which finally arrived six weeks after the original order. Fortunately this didn't affect my eligibility for the rebate - in fact they had already sent me the £95, though I hadn't sent in the proof-of-purchase paperwork they said they needed. (Um, I didn't notice the extra money until this month!) Unfortunately they cancelled the compensation they promised me! They fixed that after I made an irate phone call, but sheesh, I still get angry when remembering it.

But I suppose it's all OK in the end. The MacBook and iPod Touch are things of astounding beauty and utility. The iPod utterly blows away the Nokia 770 I previously used for spodding, and the MacBook replaces both my elderly PC laptop and my Mac Mini (though the latter will probably become a backup server). It's really nice to be able to hack away on the comfy sofa, and I'm playing choonz via the Airport Express a lot more than I did.

One notable feature of the ports on the new MacBook is that the RJ45 Ethernet socket only just fits - it wouldn't if the machine were much thinner. In fact the MacBook Air notoriously doesn't have any wired ethernet connection at all, relying on IEEE 802.11n WiFi for connectivity.

Of course other manufacturers are making expensive thin laptops. The Dell Adamo neatly solves the RJ45 thickness problem by putting the port behind the display hinge, where the machine is as thick as the body plus the display.

The Voodoo Envy's solution is to put the ethernet socket in the power supply, which acts as a WiFi hot spot which the laptop uses for connectivity. This is cute but it means you only get 54Mb/s because the base station doesn't have 802.11n support.

A cheaper and faster alternative would be to pass a gigabit Ethernet connection through from an RJ45 on the power supply to a combined power network port on the laptop. The connector could be much thinner than RJ45 without sacrificing compatibility.

ETA: The other advantage this design would have is that the combined power and ethernet connector can use the magsafe idea, so you don't have the problem of one safe and one unsafe connector when tethered.

I just spent a couple of hours debugging a strange network problem, which I haven't seen before.

The initial description was that email relayed from one departmental email server to another via my email server sometimes failed with a timeout error. This only occurred with messages with multiple recipients.

My initial suspicion was that it might be related to an old (now fixed) Exim callout verification bug, which had caused us some problems in the past. If a callout's target server was slow, and there were many recipients needing callout verification, and the client used pipelining, the total time for the callout delays could add up to more than the client's timeout, so it wouldn't see Exim's pipelined replies. Exim now flushes its replies before doing a callout to avoid this problem.

However, some testing showed that the relevant servers were quite swift, which cast some doubt on this diagnosis.

The problem report included some good details which were unfortunately too old for me to be able to correlate with our logs. I had a look to see if there was anything more recent that I could examine. It turned out that there were a couple of messages on our queues that had been delayed by this problem. I ran a test delivery of one of them and it did in fact time out between us sending the message envelope and them sending the reply. Consistent with my guess, but not consistent with other testing.

I tried a delivery with full debugging from Exim, but there was no enlightenment. I tried to send a copy of the problem message's envelope manually. This worked. Um, WTF? What was the difference between what Exim did and what I did?

I ran tcpdump to analyze the two connections. It turned out that Exim was sending the whole envelope (MAIL, RCPT, RCPT, ..., DATA) in one packet, whereas my cut-and-paste into telnet split the envelope into a packet for the MAIL command and one for the rest. So I created a file containing the envelope and used a little shell script to drive telnet. Bingo. I had reproduced the problem.

At this point I thought it might be an MTU-related problem, e.g. blocking ICMP "Fragmentation Needed" messages. I tried pinning down the packet size at which the problem started occurring, with a guess of something related to 512 bytes. After a couple of goes I noticed that the problem message had an envelope of 513 bytes, and an envelope of 512 bytes worked OK.

Then I tried a larger envelope, and - boggle - it worked. 512 or less worked, 513 failed, 514 or greater worked. This also explained why the problem affected message envelopes but not message data. The usual symptom of MTU firewall problems is timeouts at the message data stage, not during the envelope, and in this case message data was getting through fine. (The overhead of message headers makes it very unlikely that a message will have as little as 513 bytes of data.)

A couple of friends suggested testing 1025 as well, and it demonstrated similar but weirder problems. Again 1024 worked, 1025 failed, and 1026 worked. However in the failure case, the timeout occurred after I received the destination's replies to the first half of the envelope - enough commands to fit in 512 bytes. A closer look at both the 513 and 1025 cases revealed that I was getting TCP ACKs for all the data I sent, but something was going wrong after that.

I guess the problem is a firewall that's doing some kind of TCP interception and re-segmentation, and getting it wrong. The ACKs would therefore have been generated by the firewall and not by the actual destination. Someone needs to be given a good whipping with a copy of end-to-end arguments in system design before having their firewall programming licence revoked.

I'm happy to say that the problem I ranted about in my previous entry has been fixed. The general guidelines for bulk email have been restored, and the new guidelines for ~~intraspam~~ large internal mailing lists have been published on their own page.

Over a year ago we had some discussions inside the Computing Service about better provision for large mailing lists within the University. At the moment we rely too much on paper spam, and electronic spam is distributed to staff by departmental administrators who manually forward irrelevant utterances from the Old Schools. My hope was that officially sanctioned mailing lists for internal communications would improve this situation because, as well as being more efficient, there would be better moderation and recipients would be able to unsubscribe. We approved a document which I thought was quite good, and which I expected would live alongside our existing, very general, bulk email guidelines. This then went to the IT Syndicate for approval.

Unfortunately the IT Syndicate was disbanded at about this time to be replaced by an Information Services and Strategy Syndicate which has a broader remit. The large list policy got dropped for several months as the committee rebooted. When the work item was picked up again it somehow got corrupted. It was rewritten with absolutely no input from us, and it replaced the bulk email policy instead of being adopted alongside it. As a result there is now almost no policy against spam intentionally sent by University members to recipients inside and outside the University. I am extremely annoyed.

The new policy is on the ISSS web site and I'll put a copy of the old policy behind a cut to preserve it after Google's cache expires.

( Old bulk email policy )

This afternoon my colleague Andy gave a talk about Microsoft Exchange 2007, which I attended since I need to know what the competition has to offer and I have to provide support for its SMTP interfaces. One thing he mentioned which perked up my interest was "autodiscovery" for Outlook 2007. The unnecessary difficulty of configuring email software has irritated me for a long time, so after the talk I immediately went to find out more. It turns out to be a mixture of good stuff and utter FAIL.

The documentation describes three ways that Outlook 2007 can configure user accounts automatically - server names, security requirements, etc. If you are logged into a Windows Domain then it first tries querying the Active Directory. If that succeeds then it can find everything out by itself. Nice. If not, it falls back to an open protocol, which any standards-based mail server can implement, that configures the server settings automatically given an email address. More about this below.

If server-supported autodiscovery doesn't work, Outlook tries to guess the settings by attempting various combinations of host name, port number, TLS or not, POP or IMAP, etc. and stopping when it finds something that works. I think this is a great idea, so it's a damn shame that it prefers to use the lame POP not the studly IMAP if both are available. By itself that is enough to make it worth our while to implement the autodiscover protocol; but because our primary mail domain is cam.ac.uk but our service name is hermes.cam.ac.uk, Outlook's guesses will fail. This has the advantage of preventing users walking blindly into bad configurations, but the disadvantage that they're less likely to be able to configure it at all.

And so to the autodiscovery protocol itself. How can something so simple be fucked up in so many ways? It looks to me like it was bodged together by a clueless intern over two or three summer breaks, each time getting worse as the requirements evolved. The result is a case study in what RESTful protocol design is not.

The essence is simple: the client sends a request containing the user's email address, and the server sends back an XML document containing the configuration settings. The request is sent using HTTP-over-SSL to a URL derived from the domain part of the user's email address. (In fact it tries https://domain/autodiscover/autodiscover.xml first, then tries https://autodiscover.domain/autodiscover/autodiscover.xml.) So far, so good.

The first mistake is that the request is a POST with the email address contained in another XML document in the request body. It would have been more RESTful to use a GET with the email address encoded in the URL. This mistake turns into FAIL when you realise that most email services have the same settings for all users, in which case autodiscover.xml can be a static file, but many web servers do not allow POST requests to static files. If they had done it right, using a GET request with the email address in the query string would have Just Worked for both CGI scripts and static files.

It gets worse when they bodge around this mistake. A POST to a file commonly results in a "405 Method Not Allowed" error. Microsoft specify that if this is the case for your web server, you should configure it to send the autodiscover.xml file in the error reply, as if the request was successful. A foul perversion of the protocol.

It gets worse when they add support for virtual domains. The requirement seems to be that the service provider wants to host the autodiscover.xml script centrally, and doesn't want to fork out for an SSL certificate for every virtual domain. So if both of the https URLs fail, Outlook tries a GET request to the autodiscover.domain URL, which can redirect to the central autodiscover.xml. However that's not secure enough so there must also be an _autodiscover._tcp.domain SRV record in the DNS pointing to the same central web server. However that's still not secure enough so Outlook also admonishes the user to click OK in a popup dialogue box.

It gets worse when you look at the autodiscover.xml "schema". It isn't a schema, it's a commented skeleton fill-in-the-blanks autodiscover.xml file. That's just everyday lameness; the real FAIL is to be found in the various elements described therein. The redirect I mentioned above isn't an HTTP redirect, it's an autodiscover.xml document containing various redirection settings. There are also settings for controlling the expiry time of the document, because of course configuring these in a .htaccess file is too hard.

Ugh. I think that just about exhausts the rantage this stuff has triggered. I am most of the way to implementing it; I just need to get the necessary SSL certificates. It turns out to be easier than I expected because newish Apache does not get upset by POST requests aimed at static files: it just throws away the request body and returns the file with a "200 OK". No need to fart around with ErrorDocument 405 autodiscover.xml. With any luck it'll make Outlook trivial to get to work with Hermes.

A colleague tells me that our auditors are quizzing our system administrators about our backup schedules. A wag asked them if they were interested in restores too. The auditors replied, "No, just backups."

PS. I bought a copy of the book about the Chronophage from Corpus Christi plodge for £8. It is a glossy A4 booklet of 44 pages with large type and lots of colourful pictures.

I said the clock has three circles of 60 slits. In fact it has two circles of 60 and one of 48 - the hour circle has a slit for every quarter hour.
A website about the clock is going live soon.
A book about the clock is being published, and it will be available from the Corpus Christi porters' lodge amongst other places.

I have been admiring the new clock on the corner of Trumpington St and Bene't St. Instead of having hands, the clock has three concentric rings hidden behind its face. The outer ring marks seconds, the middle ring marks minutes, and the inner ring marks hours. Each ring is driven by the next outer ring via a gearing mechanism that makes the rings tick from one mark to the next. It is fairly normal for the second hand's motion to be quantized in this manner, but more unusual for the minute and hour hands to tick like this too.

Each ring sits in front of a circle of 60 blue LEDs, and the face has three circles of 60 slots which would allow the LEDs to shine through if the rings weren't in the way. The position of each disk is indicated by a slot which allows one LED to shine through the disk and the face to be seen by the viewers.

In fact, each ring has 61 slots, one of which is usually aligned with a slot in the face to indicate the time, and 60 of which are normally out of alignment. When a ring ticks from one position to the next, its 60 normally-unaligned slots line up with the face in turn before the normally-aligned slot lines up with the next slot in the face. This has the effect of making it look like the ring has flown around 366° whereas it only moved by 6°

I have made a little animation which shows how this works. It ticks once every 4 seconds and it only has 12 slots on the face and 13 on the disk, so that you can clearly see the mechanism behind the effect. I have drawn the face's slots in the outer circle and the ring's slots in the inner circle. It uses the new-fangled <canvas> HTML element, so you may need to upgrade your browser to view it.

There is a video of the clock where some of its features are described by the designer, John Taylor, who made his fortune from his invention of the kettle thermostat.

Profile

fanf

dotat.at

February 2026

S	M	T	W	T	F	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28

Syndicate

Page Summary

Uplift from SCCS to git
Nerdy trivia about Unix time_t
Bookmarklets
The early days of the Internet in Cambridge
POP, IMAP, SMTP, and the POODLE SSLv3.0 vulnerability.
Data structures and algorithms
Dilbert feeds
Update to SSHFP tutorial
SSHFP tutorial: how to get SSHFP records, DNSSEC, and VerifyHostKeyDNS=yes to work.
Relative frequency of initial letters of TLDs
Diffing dynamic raw zone files in git with BIND 9.10
A weird BIND DNSSEC resolution bug, with a fix.
Temporum: secure time: a paranoid fantasy
Security considerations for temporum: quorate secure time
nsdiff 1.47
Temporum: Quorate secure time
My circular commute
Maintaining a local patch set with git
Bacon and cabbage
More firewall hate
Duplex printers
Job Ad - VOIP sysadmin
Sainsbury's self-checkout fail
More on O(log(log(n))) searching
Searching a sorted array faster than O(log(N))
Tempting Fate, and getting her unwanted attention
CRSIDs and email addresses
Use your bonce
Define SCM
Never delete anything
Some thoughts about git
LISTSERV crapness
Configuration management
Ada Lovelace Day
John Taylor talks about the Corpus clock
Job Ad
Microblogging
The joy of lpeg
Impressive display of security clue from the Student Loans Company
Meanies
Luck or judgment?
The MacBook saga
RJ45 is too fat
Weird network bug
Licence revoked
Licence to spam
REST FAIL
LOLauditors
More on the Corpus Christi Chronophage clock
The Corpus Christi Chronophage Clock

Style Credit

Style: Gray for Boxes and Borders by branchandroot

Expand Cut Tags

No cut tags

Page generated 2026-02-28 14:48

	(1 − a) ∗ r_inst
=	(1 − e^−i/p) ∗ p/i
=	(1 − e^δx) / −δx
=	(e^δx − 1) / δx
=	(e^{0 + δx} − e⁰) / δx
=	( f(0 + δx) − f(0) ) / δx
=	δy / δx
≈	dy / dx
=	e^x
=	1