There was an interesting and instructive cockup in the BIND 9.10.4 release. The tl;dr is that they tried to improve performance by changing the layout of a data structure to use less memory. As a side effect, two separate sets of flags ended up packed into the same word. Unfortunately, these different sets of flags were protected by different locks. This overlap meant that concurrent access to one set of flags could interfere with access to the other set of flags. This led to a violation of BIND's self-consistency checks, causing a crash.
The fix uses an obscure feature of C structure bitfield
If you declare a bitfield without a name and with a zero width (in
unsigned int :0) any following bitfields must be placed
in the next "storage unit". This is described in
the C standard
You can see this in action in the BIND DNS red-black tree declaration.
A :0 solves the problem, right?
So a clever idea might be to use this
:0 bitfield separator so that
the different sets of bitfields will be allocated in different words,
and the different words will be protected by different locks.
What are the assumptions behind this fix?
If you only use a
:0 separator, you are assuming that a "word" (in a
vague hand-wavey sense) corresponds to both a "storage unit", which
the compiler uses for struct layouts, and also to the size of memory
access the CPU uses for bitfields, which matters when we are using
locks to keep some accesses separate from each other.
What is a storage unit?
The C standard specifies the representation of bitfields in section 126.96.36.199p4:
Values stored in bit-fields consist of m bits, where m is the size specified for the bit-field. The object representation is the set of m bits the bit-field comprises in the addressable storage unit holding it.
Annex J, which covers portability issues, says:
The following are unspecified: [... yada yada ...]
The alignment of the addressable storage unit allocated to hold a bit-field
What is a storage unit really?
A "storage unit" is defined by the platform's ABI, which for BIND usually means the System V Application Binary Interface. The amd64 version covers bit fields in section 3.1.2. It says,
bit-fields must be contained in a storage unit appropriate for its declared type
bit-fields may share a storage unit with other struct / union members
In this particular case, the storage unit is 4 bytes for a 32 bit
unsigned int. Note that this is less than the architecture's nominal
64 bit width.
How does a storage unit correspond to memory access widths?
It is unspecified.
The broken code declared a bunch of adjacent bit fields and forgot that they were accessed under different locks.
Now, you might hope that you could just add a
:0 to split these
bitfields into separate storage units, and the split would be enough
to also separate the CPU's memory accesses to the different storage
But you would be assuming that the code that accesses the bitfields
will be compiled to use
unsigned int sized memory accesses to read
and write the
unsigned int sized storage units.
Um, do we have any guarantees about access size?
Yes! There is
sig_atomic_t which has been around since C89, and a
load of very recent atomic stuff. But none of it is used by this part
of BIND's code.
(Also, worryingly, the AMD64 SVABI does not mention "atomic".)
So what is this "Deathstation 9000" thing then?
The DS9K is the canonical epithet for the most awkward possible implementation of C, which follows the standard to the letter, but also violates common assumptions about how C works in practice. It is invoked as a horrible nightmare, but in recent years it has become a disastrous reality as compilers have started exploiting undefined behaviour to support advanced optimizations.
(Sadly the Armed Response Technologies website has died, and
Wikipedia's DS9K page has been deleted. About the only place you can
find information about it is in old
And why is the DS9K relevant?
Well, in this case we don't need to go deep into DS9K territory. There are vaguely reasonable ABIs for which a small :0 fix would not actually work.
For instance, there might be a CPU which can only do 64 bit memory
accesses, but which has a 32 bit
int storage unit. This type
representation would probably mean the C compiler has really bad
performance, so it is fairly unlikely. But it is allowed, and there
are (or were) CPUs which can't do sub-word memory accesses, and which
have very little RAM so they want to pack data tightly.
On a CPU like this, the storage unit doesn't match the memory access,
:0 syntax for skipping to the next storage unit will fail to
achieve the desired effect of isolating memory accesses that have to
be performed under different concurrent access locks.
DS9K defence technology
So the BIND 9.10.4 fix does two things:
The most important change is to move one of the sets of bitfields from the start of the struct to the end of it. This means there are several pointers in between the fields protected by one lock and the fields protected by the other lock, so even a DS9K can't reasonably muddle them up.
Secondly, they used magical :0 syntax and extensive comments to (hopefully) prevent the erroneous compaction from happening again. Even if the bitfields get moved back to the start of the struct (so the pointers no longer provide insulation) the :0 might help to prevent the bug from causing crashes.
(edited to add)
When I wrapped this article up originally, I forgot to return to
sig_atomic_t. If, as a third fix, these bitfields were
also changed from
unsigned int to
it would further help to avoid problems on systems where the natural atomic
access width is wider than an
int, like the lesser cousin
of the DS9K I outlined above. However,
sig_atomic_t can be
signed or unsigned so it adds a new portability wrinkle.
The clever :0 is not enough, and the BIND developers were right not to rely on it.