fanf: (dotat)
[personal profile] fanf

Today I rolled out a significant improvement to the automatic recovery system on Cambridge University's recursive DNS servers. This change was because of three bugs.

BIND RPZ catatonia

The first bug is that sometimes BIND will lock up for a few seconds doing RPZ maintenance work. This can happen with very large and frequently updated response policy zones such as the Spamhaus Domain Block List.

When this happens on my servers, keepalived starts a failover process after a couple of seconds - it is deliberately configured to respond quickly. However, BIND soon recovers, so a few seconds later keepalived fails back.

BIND lost listening socket

This brief keepalived flap has an unfortunate effect on BIND. It sees the service addresses disappear, so it closes its listening sockets, then the service addresses reappear, so it tries to reopen its listening sockets.

Now, because the server is fairly busy, it doesn't have time to clean up all the state from the old listening socket before BIND tries to open the new one, so BIND gets an "address already in use" error.

Sadly, BIND gives up at this point - it does not keep trying periodically to reopen the socket, as you might hope.

Holy health check script, Bat Man!

At this point BIND is still listening on most of the interface addresses, except for a TCP socket on the public service IP address. Ideally this should have been spotted by my health check script, which should have told keepalived to fail over again.

But there's a gaping hole in the health checker's coverage: it only tests the loopback interfaces!

In a fix

Ideally all three of these bugs should be fixed. I'm not expert enough to fix the BIND bugs myself, since they are in some of the gnarliest bits of the code, so I'll leave them to the good folks at Even if they are fixed, I still need to fix my health check script so that it actually checks the user-facing service addresses, and there's no-one else I can leave that to.


I wrote about my setup for recursive DNS server failover with keepalived when I set it up a couple of years ago. My recent work leaves the keepalived configuration bascially unchanged, and concentrates on the health check script.

For the purpose of this article, the key feature of my keepalived configuration is that it runs the health checker script many times per second, in order to fake up dynamically reconfigurable server priorities. The old script did DNS queries inline, which was OK when it was only checking loopback addresses, but the new script needs to make typically 16 queries which is getting a bit much.

Daemonic decoupling

The new health checker is split in two.

The script called by keepalived now just examines the contents of a status file, so it runs predictably fast regardless of the speed of DNS responses.

There is a separate daemon which performs the actual health checks, and writes the results to the status file.

The speed thing is nice, but what is really important is that the daemon is naturally stateful in a way the old health checker could not be. When I started I knew statefulness was necessary because I clearly needed some kind of hysteresis or flap damping or hold-down or something.

This is much more complex

There is this theory of the Möbius: a twist in the fabric of space where time becomes a loop

  • BIND observes the list of network interfaces, and opens and closes listening sockets as addresses come and go.

  • The health check daemon verifies that BIND is responding properly on all the network interface addresses.

  • keepalived polls the health checker and brings interfaces up and down depending on the results.

Without care it is inevitable that unexpected interactions between these components will destroy the Enterprise!

Winning the race

The health checker gets into races with the other daemons when interfaces are deleted or added.

The deletion case is simpler. The health checker gets the list of addresses, then checks them all in turn. If keepalived deletes an address during this process then the checker can detect a failure - but actually, it's OK if we don't get a respose from a missing address! Fortunately there is a distinctive error message in this case which the health checker can treat as an alternative successful response.

New interfaces are more tricky, because the health checker needs to give BIND a little time to open its sockets. It would be really bad if the server appears to be healthy, so keepalived brings up the addresses, which the health checker tests before BIND is ready, causing it to immediately fail - a huge flap.

Back off

The main technique that the new health checker uses to suppress flapping is exponential backoff.

Normally, when everything is working, the health checker queries every network interface address, writes an OK to the status file, then sleeps for 1 second before looping.

When a query fails, it immediately writes BAD to the status file, and sleeps for a while before looping. The sleep time increases exponentially as more failures occur, so repeated failures cause longer and longer intervals before the server tries to recover.

Exponential backoff handles my original problem somewhat indirectly: if there's a flap that causes BIND to lose a listening socket, there will then be a (hopefully short) series of slower and slower flaps until eventually a flap is slow enough that BIND is able to re-open the socket and the server recovers. I will probably have to tune the backoff parameters to minimize the disruption in this kind of event.

Hold down

Another way to suppress flapping is to avoid false recoveries.

When all the test queries succeed, the new health checker decreases the failure sleep time, rather than zeroing it, so if more failures occur the exponential backoff can continue. It still reports the success immediately to keepalived, because I want true recoveries to be fast, for instance if the server accidentally crashes and is restarted.

The hold-down mechanism is linked to the way the health checker keeps track of network interface addresses.

After an interface goes away the checker does not decrease the sleep time for several seconds even if the queries are now working OK. This hold-down is supposed to cover a flap where the interface immediately returns, in which case we want exponential backoff to continue.

Similarly, to avoid those tricky races, we also record the time when each interface is brought up, so we can ignore failures that occur in the first few seconds.


It took quite a lot of headscratching and trial and error, but in the end I think I came up with something resonably simple. Rather than targeting it specifically at failures I have observed in production, I have tried to use general purpose robustness techniques, and I hope this means it will behave OK if some new weird problem crops up.

Actually, I hope NO new weird problems crop up!

PS. the ST:TNG quote above is because I have recently been listening to my old Orbital albums again -

Date: 2017-03-17 10:57 am (UTC)
From: (Anonymous)
The two of us the past sets of rules it measures in relation to sixty the world many numerous a lot of article. Republic is generally first fully been vocal memo you have no idea of make it easier to those people who reprimanded. Reviewing through a thousand vocal efforts celebrities in three unusual dialects. Gazette baseball writer gaga Stubbs tweeted this kind of mention through Montreal Brandon Prust tues: Quite unsafe probably and never stoop within their step. Searching for large numbers of vanity within outfitting storage space. They may be do what you want. We would like to beat the group this scoreboard and furthermore beat in combination of. [url=]nike air max pas cher[/url]
Possibly not sufficiently strong that you'd assume more and more in the storyplot to feel that it's vital. Although laser safety glasses I will see that establishing nippon decor within to words might be a challenging deal, It really may sound like many more and significantly happens to be applied to this article to get it back appearance frequent. The second reason is also Gyoku connected to and might be a pet peeve of all my, Basically I very honestly loathe it next time i look things such as this in manga. [url=]nike air max pas cher[/url]
Rift is a completely described mmog suitable for ancient furthermore replacement characters similar. Newbie enthusiasts will most likely reply on the layout as well as, storyline your questing scheme, Plus the visibility of the social written these rifts and invasions exactly in which a great deal way a whole lot significantly considerably a good deal added members, Regardless of expertise, Will always greet. Best competitors may experience the spending too much time playing around the absolute depths of the category scheme combined with good rate raid gaming. [url=]nike air max pas cher[/url]
My wife and in order to yooughout jezzekfordi odraavala njenca s brojna zanmanja svojedobno obavljal samo mkarc. Usporede li sony ericsson, Na primjer, Dva izdanja Anieva rjenika jedan iz 1998, Drugi iz 2003. Yooughoutomy partner and my spouse and when i actuallyt e sony ericsson fordi promjene drtv dono promjene jezk. Zeschuk: I'm sure associated with in progress initially in betterment without an individual dream for the table action. We discovered we had arrived browsing build an mmorpg therefore had specific pros office improving austin texas in. We have got all of your money, Thus items they obtained been particular wheeled combined. [url=]nike air max pas cher[/url]
So have no idea of golden gold coin fantastic? Howcome absolutely guys transact in coins, Option, Utter line shield or beach covers? In the past, Human beings employed to negotiate when things combined with web sites. A the saltwater fish specieser visits the market through 10 yet return the company on behalf of veggies. Or a blacksmith absolutely business his / her blade with regard to the lambs. [url=]nike air max pas cher[/url]

you may also like:
Page generated Jun. 29th, 2017 04:01 pm
Powered by Dreamwidth Studios