Maintained by: NLnet Labs

[Unbound-users] TCP random timeouts

Thomas Guthmann
Tue Nov 16 03:14:38 CET 2010


Hi Wouter,

Thanks for your quick reply as usual :)

> This indicates the remote end closed read the read end of the tcp
> channel. [..]
Ok. I think they happen more when unbound seems to struggle to resolve 
TCP queries. Indeed if recursion takes too long, the client closes the 
connection after a defined timeout.

> But this does not seem to explain the tcp issues you say you have.  What
> is the observed problem again?  What queries are giving resolution
> problems and timeouts - are you sure they even use tcp [..]
My problem is that sometimes any TCP queries timeout because they seem 
stuck somewhere in unbound. It's _not_ due to a specific query. I have 
the problem randomly and it affects all TCP queries. My loadbalancers 
test my unbound clusters by doing some : "dig +tcp google.com 
@UNBOUND_IP". I tried locally when it seems stuck and I have the same 
behaviour. How can I distinguish the TCP from the UDP queries in a 
unbound dump_requestlist ?

For the moment, my problem hasn't occurred again but I know it will. I 
will try to do more debugging next time to see if it's unbound or 
something else.

Also I forgot to paste my current config. Each server has 1G of memory 
and unbound used ~760M once the cache is full. Servers are not heavily 
used yet (see stats below over a ~5min period of time).

server:
         verbosity: 1
         interface-automatic: yes
         outgoing-range: 950
         so-rcvbuf: 4m
         msg-cache-size: 200m
         rrset-cache-size: 400m
         access-control: 127.0.0.0/8 allow
         ...
         hide-identity: yes
         hide-version: yes
         prefetch: yes
         prefetch-key: yes
         dlv-anchor-file: "dlv.isc.org.key"

python:
remote-control:
         control-enable: yes

thread0.num.queries=123312
thread0.num.cachehits=96665
thread0.num.cachemiss=26647
thread0.num.prefetch=3679
thread0.num.recursivereplies=26387
thread0.requestlist.avg=153.573
thread0.requestlist.max=229
thread0.requestlist.overwritten=0
thread0.requestlist.exceeded=0
thread0.requestlist.current.all=118
thread0.requestlist.current.user=56
thread0.recursion.time.avg=5.068339
thread0.recursion.time.median=0.0445492
total.num.queries=123312
total.num.cachehits=96665
total.num.cachemiss=26647
total.num.prefetch=3679
total.num.recursivereplies=26387
total.requestlist.avg=153.573
total.requestlist.max=229
total.requestlist.overwritten=0
total.requestlist.exceeded=0
total.requestlist.current.all=118
total.requestlist.current.user=56
total.recursion.time.avg=5.068339
total.recursion.time.median=0.0445492
time.now=1289873333.934557
time.up=351187.814711
time.elapsed=22.832136