Maintained by: NLnet Labs

[Unbound-users] Unbound and timeouts

Dave Mill
Thu Mar 18 22:12:48 CET 2010


Hi guys

I'm having a slight unbound issue and am wanting a bit of advice on
whether my assumptions are correct and what the best solution actually
is. First a bit of background information that should be pretty
relevant to this issue.

We use Unbound for our primary customer facing DNS server. We are a
small ISP with about 12,000 customers. Most of those customers use our
DNS servers. Unbound replaced a Bind server that would typically sit
at about 0.25 load average (plus had many issues), whilst Unbound
tends to hover around 0.01. I've followed the unbound optimizing guide
pretty extensively when setting up this server about a year ago. We're
very happy with Unbound so far.

We're based in New Zealand which leads to two quite important/relevant
things. 1) We tend to buy both domestic (nz only) bandwidth,
international (non-nz) bandwidth, plus have various peers at two
peering exchanges here in NZ. 2) Latency to the west coast of the
states tends to be about 130ms at a minimum and latency to say
bbc.co.uk for example is about 300ms.

Recently we've had a couple of international outages. During this time
all of our domestic connectivty and peers are fine, so any local .nz
zones should still be resolvable. However, what we tend to experience
is that during our international outages custdns1 (our unbound server)
will not respond to any DNS queries at all. Our secondary recursive
DNS server, custdns2, running bind will still respond to .nz DNS
queries.

During our most recent international outage I managed to catch this
happening. Whilst I didn't capture any information the following
seemed to be the case:

1) The load average on the unbound server was sitting at about 0.8
2) Unbound was trying to establish many connections to various
international authoriative DNS servers and taking quite a while to
timeout when doing so.

So, I'm assuming Unbound is still quite capable of resolving .nz
queries but its consuming all of its resources trying to resolve
international queries leaving no room for .nz queries.

Reading up on unbound.conf I find a useful option called
jostle-timeout . This looks like it could be handy for what I'm
experiencing here. I see that it also defaults to 200ms which is
probably less than useful (see above comments about latency to the US
and UK).

So, has anyone used the jostle-timeout setting for similar reasons to
this? Are there any other timeout settings that I should be
experimenting with? Are all of my assumptions here wrong? Anything
else I should look at?

We don't suffer from many international outages but when they do occur
I'd like to see our .nz interwebs still functioning as they should.

Cheers
Dave