Maintained by: NLnet Labs

[Unbound-users] Unbound stops responding

W.C.A. Wijngaards
Fri Aug 16 15:41:17 CEST 2013


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Petter,

Responding to your second question: how to keep up during such an outage.

Local queries should then not depend on the internet provider.  Put a
stub definition in unbound.conf for your intranet name(s) and point it
at the IP-addresses of your local authoritative DNS server(s).  In
this way, local lookups for unbound do not need to cross over the
internet and can be resolved normally even while the ISP is down.

Unbound can deal with the situation that a part of the queries
resolves normally, but another part does not resolve (at all) and
unbound is swamped with queries for the slow part.  It should be able
to resolve the fast queries and slowly keep working on (a fraction of)
the slow queries.

For a capacity increase you would need to use Linux somehow (or
FreeBSD).  On windows, increase the num-threads (even above number of
CPUs, because it gives another set of sockets for every thread).
num-thread: 10, and every thread about 50 sockets outgoing-range, and
a low num-queries-per-thread too, say 25, for 500 sockets instead of
64, and 250 num-queries in total.  Does that work?  It defines
num-queries smaller than the outgoing-range.

Best regards,
   Wouter

On 08/16/2013 03:24 PM, W.C.A. Wijngaards wrote:
> Hi Petter,
> 
> On windows we currently have a 64 limit, not a 1024 limit.  This
> is because of a windows limit for its waitformultipleevents
> winsock function.  And we need to wait for 'locks' for
> inter-thread communication.  Windows does not implement unix-pipes
> that we use on Unix.
> 
> So, the machine is massively not able to keep up with the work.
> 
> When you have the issue, the number of cache hits is roughly the
> same (this indicates it did respond to cache hits?).  But the
> non-cache is 600.000 (20x the normal number).  This causes the CPU
> spike.  And it cannot answer any of these questions because it has
> no sockets, and the upstream is unresponsive anyway.
> 
> I think the 20x traffic increase may be why it does not respond to
> cache responses.  It should have responded to cache responses, but
> if the network stack is swamped with packets, unbound does not have
> a chance.
> 
> Or is the machine easily capable of 20x the network load? (note
> that UDP load may be different from other load).  Then we should
> look at the performance under this denialofservice condition.
> Especially I am worried because your num-queries-per-thread is a
> lot bigger than what you have sockets for, which does not happen
> (usually) for unices.
> 
> Best regards, Wouter
> 
> On 08/16/2013 02:38 PM, Petter Lindgren wrote:
>> We’ve used Unbound for a few months for around 15 000 clients I
>> our company.
> 
> 
> 
>> We recently had an issue when our Internet provider had a big 
>> network error.
> 
>> This meant that our three unbound resolvers couldn’t perform 
>> recursion.
> 
> 
> 
>> But during the outage, they stopped responding to any queries
>> and the CPU spiked.
> 
>> This meant that we couldn’t resolve local names.
> 
>> After the outage was resolved, they started responding as usual 
>> again.
> 
> 
> 
>> Statistics before outage:
> 
>> 2013-08-14 11:40:21 unbound[1180:0] info: server stats for
>> thread 0: 137441 queries, 107082 answers from cache, 30359
>> recursions, 0 prefetch
> 
>> 2013-08-14 11:40:21 unbound[1180:0] info: server stats for
>> thread 0: requestlist max 38 avg 4.54215 exceeded 0 jostled 0
> 
>> 2013-08-14 11:40:21 unbound[1180:0] info: average recursion 
>> processing time 0.120068 sec
> 
>> 2013-08-14 11:40:21 unbound[1180:0] info: histogram of recursion
>>  processing times
> 
>> 2013-08-14 11:40:21 unbound[1180:0] info: [25%]=0.00974991 
>> median[50%]=0.0314827 [75%]=0.162277
> 
> 
> 
>> Statistics during outage:
> 
>> 2013-08-14 12:40:21 unbound[1180:0] info: server stats for
>> thread 0: 717834 queries, 79368 answers from cache, 638466
>> recursions, 0 prefetch
> 
>> 2013-08-14 12:40:21 unbound[1180:0] info: server stats for
>> thread 0: requestlist max 893 avg 649.479 exceeded 520726 jostled
>> 90153
> 
>> 2013-08-14 12:40:21 unbound[1180:0] info: average recursion 
>> processing time 16.060234 sec
> 
>> 2013-08-14 12:40:21 unbound[1180:0] info: histogram of recursion
>>  processing times
> 
>> 2013-08-14 12:40:21 unbound[1180:0] info: [25%]=0.0128234 
>> median[50%]=0.046963 [75%]=0.241531
> 
> 
> 
> 
> 
>> What can I do to make sure that Unbound still responds to a
>> local query when there is an outage?
> 
> 
> 
>> We run Unbound on Windows Server 2012 x64 (because we have more 
>> competence using Windows)
> 
> 
> 
>> Is it possible to overcome the 1024 limit for /outgoing-range/
>> and /num-queries-per-thread/ when running Windows and would it
>> have helped?
> 
> 
> 
>> I would really appreciate comments and suggestions!
> 
> 
> 
>> Thanks!
> 
>> / /
> 
>> /_________________________________________/
> 
>> /Petter Lindgren/
> 
> 
> 
> 
> 
>> _______________________________________________ Unbound-users 
>> mailing list Unbound-users at unbound.net 
>> http://unbound.nlnetlabs.nl/mailman/listinfo/unbound-users
> 
> 
> _______________________________________________ Unbound-users
> mailing list Unbound-users at unbound.net 
> http://unbound.nlnetlabs.nl/mailman/listinfo/unbound-users
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJSDiv9AAoJEJ9vHC1+BF+NEAEP/jmw+jG/4wJ4PXvBHxXXRaOR
RCjlbYwuO34xtD+Y8mAOZceXbfHkE9rAzieWiSLnHnJF07lhYoNSiMU0Vsl+2aCN
vmYSIczVoORRd9o4ih3szqWbeDUBEKiyjcfzZhv7LwbmuzLMfHWpVXKMzGe78fWa
6wlkJcN1n//6ujyOSRn79W/ryj1eKfcEegVRbLdrN2x9dEeTwPMG2scmIFEjgCzu
SDudolOkfxNB7uzyDwrYIsJSvO4P7MQKkcBE/UKAgjHhZ6thY62Q+pqsucOiXVn9
QmEG4KAAJ1X/VMf2WYqLIErXLhIE2PDTVXJTI5eMp1nWi0tauz3YN/jLIgUYDNaM
oAEafylioHUOimzbtYnTYPt+FwR/jYOV8VEwcDL5dyL2q++Jkn02Rq/FNesnzVrc
nUPwgGWbVSlGf4ZNi+rBpml994J6rR1qf7MaFvMtBeH+dniJEK/EnDbKM8D7ltyP
qlaHekczoMAmNnhWJa64g7gy+ajkzw9se1mqs67bDNJ5eocFVCqN2Wi+S+Tr1q0W
uhLbRonDyhfaPl6JUzETJKE8Aza0SUOx/5gV0Hs69L+BcIwqkp6I+k8+YrczibiL
GomKqZDZ+RUF0kpVSxbZ7gfBt7dZjbRPiwlJ2xcDeJM5+cjrDhZa/B4xtWBqEL1t
ztqDXkIzPpGnB9MkTsEH
=Bxsa
-----END PGP SIGNATURE-----