Maintained by: NLnet Labs

[Unbound-users] Uneven load on threads

Sven Ulland
Thu Jul 5 16:55:06 CEST 2012


What determines how queries are scheduled to the available threads?
We're seeing very uneven load on the 16 threads we have configured:

   thread0:   9.60%
   thread1:   1.40%
   thread2:  26.97%
   thread3:   3.05%
   thread4:   0.75%
   thread5:   6.20%
   thread6:   2.09%
   thread7:   4.37%
   thread8:   1.40%
   thread9:   8.38%
   thread10:  0.97%
   thread11:  2.69%
   thread12: 14.85%
   thread13:  6.92%
   thread14:  9.49%
   thread15:  0.87%

We have around 15-20k queries per second in total on this node, and
while the node is not struggling by any means (and we could easily run
with fewer threads), it would be interesting to understand what's
happening. Queries are coming in from nodes spread across a few
subnets, with random source ports.

We have 8 queues on the network card, so we distribute interrupts
between 8 CPUs (manually configured by echoing cpu masks into
/proc/irq/n/smp_affinity). I was thinking that there's a relationship
here, that causes packets received on a certain queue -- and thus
a certain CPU -- to end up being handled by the Unbound thread running
on the same CPU, or copied to another CPU with a NET_RX softirq. This
could be way off. Perhaps this would work better with forked
operation.

Network card input queue interrupt rates:

   eth0-0:  0.6%
   eth0-1: 12.1%
   eth0-2: 11.7%
   eth0-3: 32.1%
   eth0-4: 10.6%
   eth0-5: 10.6%
   eth0-6: 10.5%
   eth0-7: 11.9%

If anyone could shed some light on this, both how the uneven load
comes about, and how the packet-to-queue-to-cpu-to-application_thread
works (and how it should be set up for optimal performance, possibly
ref [1] and taskset), it would be much appreciated!

Relevant parts of unbound.conf, version 1.4.16:
   num-threads: 16
   msg-cache-slabs: 16
   rrset-cache-slabs: 16
   infra-cache-slabs: 16
   key-cache-slabs: 16
   rrset-cache-size: 2000m
   msg-cache-size: 500m
   outgoing-range: 8192 # Yes, --with-libevent
   num-queries-per-thread: 4096
   so-rcvbuf: 8m
   so-sndbuf: 8m
   extended-statistics: yes

[1]: Documentation/networking/scaling.txt
<URL:http://lxr.linux.no/#linux+v3.4.4/Documentation/networking/scaling.txt>

Sven