Maintained by: NLnet Labs

[Unbound-users] Requestlist size difference 1.3.4 / 1.4.6

W.C.A. Wijngaards
Mon Sep 13 14:59:00 CEST 2010


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Peter,

Thanks to good testing by Peter Bagari, I got the clues:
The issue is found out to be this, unbound-control lookup ..
67.228.190.60   rtt 120024 msec, 0 lost. EDNS 0 probed.
70.38.5.34       rtt 120012 msec, 0 lost. EDNS 0 probed.

These servers timeout on some names (nxdomains), but answer validly and
quickly for good names (www.yi.org).  This is a misconfiguration of the
server or some firewall.  Such behaviour is also seen based on query
type, such as type=MX, for other servers.

The rtt of 120000 and 0 lost indicates that this is the problem.
Unbound is trying to contact a lot of those names, but does not
immediately throw away the query, because www.yi.org might work right away.

Version 1.3.4 would fail to resolve www.yi.org in this situation, but
1.4.2 makes this work.  Here unbound is doing a lot of effort to resolve
the query.  It also has the resources.

If resources get thin, then long-running queries are jostled out to make
space for faster queries.

It seems there is nothing to fix.  It is possible to identify the
situation in the program - rtt 120000 and 0 lost - but the only thing
that I can do is try, because sometimes the server answers.  There is
already code there to defend the requestlist resource...

Best regards,
   Wouter

On 09/09/2010 03:54 PM, W.C.A. Wijngaards wrote:
> Hi Paul,
> 
> On 09/09/2010 03:40 PM, Paul Wouters wrote:
>> On Thu, 9 Sep 2010, W.C.A. Wijngaards wrote:
>> And prefetching low TTL records?
> 
> No, this did not exist in 1.3.4, thus it is not enabled for him.
> 
> I suspect this change:
> 18 February 2010: No more blacklisting of unresponsive servers, a 2
> minute timeout is backed off to.
> 
> And I saw that the requestlist he included is full of unresponsive
> servers.  This fits with the graphs he sent, that see no increase in CPU
> (it does not do anything but waiting) with extra entries (thus those
> entries are on a long timeout).
> 
> Looks like a tradeoff between how much work to put into trying to
> resolve to bad servers (: 16 lost packets, huge timeout).  Otherwise the
> server used to be blacklisted for 15 minutes (infra-ttl).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iEYEARECAAYFAkyOIBQACgkQkDLqNwOhpPh1wgCdF/yTWPbnxKgqlzWJkiy65yoP
ZUIAoIl6SIHtjYxbHH+piUztbXbnBgV3
=F2Sa
-----END PGP SIGNATURE-----