I haven't got round to setting up proper performance monitoring for our DNS servers yet, so I have been making fairly ad-hoc queries against the BIND statistics channel and pulling out numbers with jq.
Last week I changed our new DNS setup for more frequent DNS updates. As part of this change I reduced the TTL on all our records from one day to one hour. The obvious question was, how would this affect the query rate on our servers?
So I wrote a simple monitoring script. The first version did,
while sleep 1
do
fetch-and-print-stats
done
But the fetch-and-print-stats part took a significant fraction of a second, so the queries-per-second numbers were rather bogus.
A better way to do this is to run `sleep` in the background, while you fetch-and-print-stats in the foreground. Then you can wait for the sleep to finish and loop back to the start. The loop should take almost exactly a second to run (provided fetch-and-print-stats takes less than a second). This is pretty similar to an alarm()/wait() sequence in C. (Actually no, that's bollocks.)
My dnsqps script also abuses `eval` a lot to get a shonky Bourne shell version of associative arrays for the per-server counters. Yummy.
So now I was able to get queries-per-second numbers from my servers, what was the effect of dropping the TTLs? Well, as far as I can tell from eyeballing, nothing. Zilch. No visible change in query rate. I expected at least some kind of clear increase, but no.
The current version of my dnsqps script is:
#!/bin/sh
while :
do
sleep 1 & # set an alarm
for s in "$@"
do
total=$(curl --silent http://$s:853/json/v1/server |
jq -r '.opcodes.QUERY')
eval inc='$((' $total - tot$s '))'
eval tot$s=$total
printf ' %5d %s' $inc $s
done
printf '\n'
wait # for the alarm
done
small tweak
Date: 2015-02-24 17:36 (UTC)But only a small change can make it work again:
true & # any old thing to wait on
while wait
do
sleep 1 &
....
done
cheers,
Mike
Re: small tweak
Date: 2015-02-24 18:19 (UTC)Bash's signal handling and error propagation is very bad :-( At least, I think it is Bash's fault. It is made worse by the buffer bloat in Linux's terminals and pipes, so stopping an output flood is unnecessarily hard.
DNS stats
Date: 2015-02-24 18:58 (UTC)But I'm left wondering why you didn't write it in a language with built in HTTP, JSON, and associative array support? Python, Perl, Go, etc all come to mind. I suspect it'd be no more lines, and easier to read/write. And it doesn't look like you're getting any benefit from the Bourne shell -- eg no pipeline tricks, or redirection tricks.
Ewen
PS: thanks for the pointer to "jq" -- that looks like a useful tool for ad hoc JSON command line handling.
Re: small tweak
Date: 2015-02-24 22:08 (UTC)Why, yes, I am a hypocrite.
But when do you stop just adding features/fixing edge cases/fixing bugs and start re-writing in perl/ruby/whatever? 10^{0,1,2,3,4} LOC?
-- Mike
Re: DNS stats
Date: 2015-02-24 22:59 (UTC)#!/usr/bin/perl use warnings; use strict; use JSON; use LWP::Simple; use Time::HiRes qw(time sleep); my (%tot,%inc); $tot{$_} = 0 for @ARGV; my $mark = time; for (;;) { for my $s (@ARGV) { my $t = decode_json(get("http://$s:853/json/v1/server")) ->{opcodes}->{QUERY}; $inc{$s} = $t - $tot{$s}; $tot{$s} = $t; printf " %5d %s", $inc{$s}, $s; } print "\n"; $mark += 1; sleep $mark - time; }Re: DNS stats
Date: 2015-02-24 23:05 (UTC)FWIW, I (now!) have a personal metric that if I find I really want associative arrays then shell script is probably the wrong language choice.... I've done that "associative arrays in shell" a few times too, but each time it turned out to be a strong hint that the program was growing beyond "a few lines of shell" and would be much easier converted to something else with the features I wanted built in.
Ewen
Re: DNS stats
Date: 2015-02-24 23:14 (UTC)Another reason is if you need to check command exit status carefully, especially in pipelines. It seems ironic that a command language is often not that good at running commands!
no subject
Date: 2015-02-25 12:29 (UTC)Re: DNS stats
Date: 2015-02-25 12:31 (UTC)'set -o pipefail'?
no subject
Date: 2015-02-25 12:58 (UTC)Re: DNS stats
Date: 2015-02-25 13:02 (UTC)Since I wrote nsdiff this is my own fault. I should probably learn not to write programs that produce "clever" exit status codes :-)
wow
Date: 2015-02-25 17:42 (UTC)We try to make it easy ;-)
Re: DNS stats
Date: 2015-03-02 23:45 (UTC)Re: DNS stats
Date: 2015-03-03 12:29 (UTC)