Maintained by: NLnet Labs

[Unbound-users] Uneven load on threads

lst_hoe02 at kwsoft.de
Thu Jul 5 22:40:20 CEST 2012


Zitat von Sven Ulland <sveniu at opera.com>:

> What determines how queries are scheduled to the available threads?
> We're seeing very uneven load on the 16 threads we have configured:
>
>   thread0:   9.60%
>   thread1:   1.40%
>   thread2:  26.97%
>   thread3:   3.05%
>   thread4:   0.75%
>   thread5:   6.20%
>   thread6:   2.09%
>   thread7:   4.37%
>   thread8:   1.40%
>   thread9:   8.38%
>   thread10:  0.97%
>   thread11:  2.69%
>   thread12: 14.85%
>   thread13:  6.92%
>   thread14:  9.49%
>   thread15:  0.87%
>
> We have around 15-20k queries per second in total on this node, and
> while the node is not struggling by any means (and we could easily run
> with fewer threads), it would be interesting to understand what's
> happening. Queries are coming in from nodes spread across a few
> subnets, with random source ports.
>
> We have 8 queues on the network card, so we distribute interrupts
> between 8 CPUs (manually configured by echoing cpu masks into
> /proc/irq/n/smp_affinity). I was thinking that there's a relationship
> here, that causes packets received on a certain queue -- and thus
> a certain CPU -- to end up being handled by the Unbound thread running
> on the same CPU, or copied to another CPU with a NET_RX softirq. This
> could be way off. Perhaps this would work better with forked
> operation.
>
> Network card input queue interrupt rates:
>
>   eth0-0:  0.6%
>   eth0-1: 12.1%
>   eth0-2: 11.7%
>   eth0-3: 32.1%
>   eth0-4: 10.6%
>   eth0-5: 10.6%
>   eth0-6: 10.5%
>   eth0-7: 11.9%
>
> If anyone could shed some light on this, both how the uneven load
> comes about, and how the packet-to-queue-to-cpu-to-application_thread
> works (and how it should be set up for optimal performance, possibly
> ref [1] and taskset), it would be much appreciated!
>
> Relevant parts of unbound.conf, version 1.4.16:
>   num-threads: 16
>   msg-cache-slabs: 16
>   rrset-cache-slabs: 16
>   infra-cache-slabs: 16
>   key-cache-slabs: 16
>   rrset-cache-size: 2000m
>   msg-cache-size: 500m
>   outgoing-range: 8192 # Yes, --with-libevent
>   num-queries-per-thread: 4096
>   so-rcvbuf: 8m
>   so-sndbuf: 8m
>   extended-statistics: yes
>
> [1]: Documentation/networking/scaling.txt
> <URL:http://lxr.linux.no/#linux+v3.4.4/Documentation/networking/scaling.txt>

This has been discussed lately and as far as i understand the  
distribution between the threads is a OS duty  
(https://unbound.net/pipermail/unbound-users/2012-February/002240.html).

Regards

Andreas