Maintained by: NLnet Labs

[Unbound-users] Requestlist filling ? automatic cleanup ?

W.C.A. Wijngaards
Sun Mar 20 22:05:58 CET 2011


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Thomas,

hat version are you using?  Recently the timeout code was changed to
cope with this sort of situation (1.4.7):
http://www.unbound.net/documentation/info_timeout.html

On 03/20/2011 02:31 PM, Thomas wrote:
> Hi,
> 
> I don't really understand how the request list flush old and often wrong
> queries from the request list. It is said in the man. that
> jostle-timeout is triggered when the server is very busy. What defines
> 'busy' ?

The requestlist is full.

Your requestlist is the default, so about 1000 and 300 does not fill it
up.  I would recommend a recompile with libevent because of your
somewhat high load (then you can increase the requestlist and range to
several thousand, and in recent versions the default increases by
itself, http://www.unbound.net/documentation/howto_optimise.html )

> For instance we have often a lot of crappy queries towards
> groupinfra.com (see attachment for a dump_requeslist|grep groupinfra)
> and they seem to never go away and fill our request list by more than a
> half (e.g: 195/375). Could that impact unbound reactivity ?

No, other queries that priority over these older queries.

The requestlist is divided into two halves: run-to-completion, and
fast-stuff.  The run-to-completion is that.  The fast stuff deletes
older queries to make room for new queries (but not unless the
jostle-timeout has expired, otherwise you could deleted everything that
comes in immediately under a DoS).

> Note: jostle-timeout is still set to the default (see my config below).

Yes that should be OK.  If you lower it, it will be more likely to drop
the groupinfra stuff.

> I am asking that because sometimes our unbounds have a random hiccup and
> I am wondering if it could be due to this or not. The 'hiccup' is very
> hard to debug because it's random (once a month or so) on servers doing
> something like 500 to 1500 qps each so increasing the verbosity from 1
> to 2 is not really possible :)

What seems to happen is groupinfra has a lot of servers.  And they
sometimes experience outages. When they experience an outage, unbound
gets timeouts and tries to fetch the names, but also the other
nameserver names (and there are a lot of them).  Given user demand for
groupinfra, unbound starts to explore all the nameservers for
groupinfra, with timeouts and thus the entries fill up your requestlist.
 The dependency structure is like that log excerpt that you show.
Because the thing has timeouts those entries are necessarily pretty old,
and thus (the ones in the fast-stuff list) would be dropped to make room
for new queries (if there was a lack of space, but there is no lack of
space, so these queries are performed: there is interest and there is
capacity to undertake actions to find the answers).

Best regards,
   Wouter
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.15 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org/

iEYEARECAAYFAk2GbDYACgkQkDLqNwOhpPhAOACgn0abEas9OQ9zSI1BHoSVCagF
bQsAnR9AVYDPgV7T8Jrokr7yra8lpMIX
=9yqr
-----END PGP SIGNATURE-----