Maintained by: NLnet Labs

[Unbound-users] Query problems with multi-threaded server

Roberto Adler
Fri Jan 16 13:47:47 CET 2009


Hi,

I couldn't test with verbosity 4 because the server can't handle it, 
even for a short time.
So I set up a new server just to test but the problem didn't occur.
I'll try to make more tests later.

A question:
Since with separate threads we have separate caches, and one cache 
could have the answer and
other could be negative (unusual but can occur), how the command 
"unbound-control dump_cache"
would work ?

Also I understand that the commands from unbound-control (flush, 
lookup, etc) are sent to all the threads.
Correct ?

I compiled unbound with separate threads because with libevent I 
noticed that the threads are geting uneven
load. My main server receives 150k qry/min (9.000.000 qry/hr). With 
libevent one thread deals with, for example,
70 % of the queries and the other three 10% each.
With separate threads each one manage 25% of tthe queries.

The ulimit of my servers are:
ulimit -n:      100000
ulimit -m       unlimited
ulimit -v       unlimited

Thanks,

Roberto

At 12:29 15/1/2009, you wrote:
>Hi Roberto,
>
>I cannot see anything immediately wrong.
>
>What is your ulimit(open files)?  It should not really matter with
>multiple processes like you compiled with.  What is your ulimit for
>memory usage as well.
>
>Otherwise, yes the different threads behave like completely separate
>unbound programs.  With your configure options they only share port 53
>(and stop and reload together).
>
>Could you enable verbosity higher, verbosity 4 perhaps, for a short
>time, and capture the SERVFAIL in a log file and send that to me?
>(offlist, gzipped).
>
>Best regards,
>    Wouter
>
>Roberto Adler wrote:
> > Hi,
> >
> > I'm running unbound with 4 separate threads and I notice that sometimes
> > the same query
> > returns ok and sometimes returns error.
> > I understood that with 4 threads each one have a separate cache. Is this
> > correct ?
> > So maybe one cache has the data and other couldn't get it.
> >
> > Normally the query below returns ok:
> > ----------------------------------------------
> > # dig @wks11.rjo www.sun.com.br
> >
> > ; <<>> DiG 9.3.4-P1 <<>> @wks11.rjo www.sun.com.br
> > ; (1 server found)
> > ;; global options:  printcmd
> > ;; Got answer:
> > ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 9503
> > ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 4, ADDITIONAL: 4
> >
> > ;; QUESTION SECTION:
> > ;www.sun.com.br.                        IN      A
> >
> > ;; ANSWER SECTION:
> > www.sun.com.br.         37931   IN      CNAME   br.sun.com.
> > br.sun.com.             37931   IN      A       72.5.124.12
> >
> > ;; AUTHORITY SECTION:
> > sun.com.                39455   IN      NS      ns1.sun.com.
> > sun.com.                39455   IN      NS      ns2.sun.com.
> > sun.com.                39455   IN      NS      ns7.sun.com.
> > sun.com.                39455   IN      NS      ns8.sun.com.
> >
> > ;; ADDITIONAL SECTION:
> > ns1.sun.com.            36656   IN      A       192.18.128.11
> > ns2.sun.com.            84888   IN      A       192.18.99.5
> > ns7.sun.com.            36656   IN      A       192.18.43.15
> > ns8.sun.com.            36656   IN      A       192.18.43.12
> >
> > ;; Query time: 0 msec
> > ;; SERVER: 200.255.125.211#53(200.255.125.211)
> > ;; WHEN: Thu Jan 15 10:13:01 2009
> > ;; MSG SIZE  rcvd: 208
> > ----------------------------------------------
> >
> > But sometimes returns:
> > ----------------------------------------------
> > # dig @wks11.rjo www.sun.com.br
> >
> > ; <<>> DiG 9.3.4-P1 <<>> @wks11.rjo www.sun.com.br
> > ; (1 server found)
> > ;; global options:  printcmd
> > ;; Got answer:
> > ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 21333
> > ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
> >
> > ;; QUESTION SECTION:
> > ;www.sun.com.br.                        IN      A
> >
> > ;; Query time: 0 msec
> > ;; SERVER: 200.255.125.211#53(200.255.125.211)
> > ;; WHEN: Thu Jan 15 10:12:59 2009
> > ;; MSG SIZE  rcvd: 32
> > ----------------------------------------------
> >
> > What is happening here ?
> > When I run with 1 thread I never had these symptom.
> >
> > The hardware:
> > Intel Xeon with 8 processors, 5G memory
> > CentOS release 5.2
> > unbound 1.2.0
> >     compile options: --without-pthreads --without-solaris-threads
> > unbound.conf:
> >     num-threads: 4
> >     outgoing-range: 700
> >     num-queries-per-thread: 700
> >
> >     outgoing-num-tcp: 10
> >     incoming-num-tcp: 10
> >
> >     rrset-cache-size: 1000m
> >     msg-cache-size: 100m
> >
> >     rrset-cache-slabs: 8
> >     msg-cache-slabs: 8
> >     infra-cache-slabs: 8
> >     key-cache-slabs: 8
> >
> > The server have an average load of 10000 qry/min so its quite free.
> >
> > Thanks,
> >
> > Roberto
> >
> > _______________________________________________
> > Unbound-users mailing list
> > Unbound-users at unbound.net
> > http://unbound.nlnetlabs.nl/mailman/listinfo/unbound-users