Maintained by: NLnet Labs

[Unbound-users] High number of system context switches

Jan-Frode Myklebust
Thu Nov 13 21:42:43 CET 2014


Just a follow up to myself, and a big thank you to unbound!

It turned out our virtual DNS-servers were struggeling with a
cpu in 100% softirq, processing incoming packets. This could only run on
one cpu, so we had no way of scaling this out. Newer virtio-net has
up to 8 queues, to scale this over more cpus, but that's not available
on RHEV-H, so we've moved our DNS-servers over to physical hosts.

Still we've been struggeling with the performance of Bind on these
physical servers. Once the primary DNS-server hit ~17-20K qps, we've
notice clients started moving to the secondary server -- but we could
never see anything wrong on the server side.

Yesterday we put unbound (from EPEL6) on the primary dns-server, and
immediately saw a huge effect ! This single server is now answering >30K
qps during prime time, and the secondary server doesn't really have any
peaks anymore -- clients are no longer failing over to secondary.

We're still seeing ~300K cswch/s on this server, which was a bit
worrying initially, but doesn't seem to be any problem yet.. Will
probably be OK for now, until we upgrade to RHEL7 with libevent-2 which
was mentioned might help..



  -jf


On Thu, Apr 10, 2014 at 04:30:57PM +0200, Jan-Frode Myklebust wrote:
> Hi, 
> 
> I'm considering switching from bind to unbound, and have been testing it
> on one of our recursive dns servers. Our servers are KVM virtual machines,
> running RHEL6.5, 12GB memory, 8 cpu cores, with unbound-1.4.21-1.el6.x86_64
> from EPEL.
> 
> We typically have high time periods of 12-14k qps on our DNS-servers, and
> it's been working fairly well on bind, but with unbound we seem to get into
> trouble when the qps exceed 7k. We then see a clear drop in the request
> rate, and clients move over to the secondary dns server.
> 
> The only problem we've noticed on the unbound server is that context
> switches/s is very high. pidstat for unbound doesn't report high
> cswch/s, but the system does..
> 
> Here's "sar -w" from yesterday evening running unbound:
> 
> kl. 19.00 +0200    proc/s   cswch/s
> kl. 19.10 +0200      3,70 116480,30
> kl. 19.20 +0200      3,47 123118,48
> kl. 19.30 +0200      3,67 128948,60
> kl. 19.40 +0200      3,45 125471,32
> kl. 19.50 +0200      3,69 132641,76
> kl. 20.00 +0200      3,48 140126,75
> 
> while the day before on bind:
> 
> kl. 19.00 +0200      1,90  64801,51
> kl. 19.10 +0200      2,15  64550,78
> kl. 19.20 +0200      1,94  64389,23
> kl. 19.30 +0200      2,19  64369,56
> kl. 19.40 +0200      1,92  64211,15
> kl. 19.50 +0200      2,09  64087,84
> kl. 20.00 +0200      1,91  63691,33
> 
> 
> Any ideas for what we should try to improve this?
> 
> Full unbound.conf stripped for comments:
> -------------------------------------------------------------------
> server:
> verbosity: 1
> statistics-interval: 60
> statistics-cumulative: yes
> extended-statistics: yes
> num-threads: 8
> interface: 0.0.0.0
> interface: ::0
> interface-automatic: yes
> outgoing-range: 4096
> outgoing-port-permit: 32768-65535
> outgoing-port-avoid: 0-32767
> max-udp-size: 3072
> msg-cache-size: 4G
> num-queries-per-thread: 4096
> rrset-cache-size: 8G
> cache-min-ttl: 2
> do-ip4: yes
> do-ip6: yes
> do-udp: yes
> do-tcp: yes
> access-control: 0.0.0.0/0 allow
> access-control: ::0/0 allow
> chroot: ""
> username: "unbound"
> directory: "/etc/unbound"
> log-time-ascii: yes
> pidfile: "/var/run/unbound/unbound.pid"
> harden-glue: yes
> harden-dnssec-stripped: yes
> harden-below-nxdomain: yes
> harden-referral-path: yes
> use-caps-for-id: no
> unwanted-reply-threshold: 10000000
> prefetch: yes
> prefetch-key: yes
> rrset-roundrobin: yes
> minimal-responses: no
> dlv-anchor-file: "/etc/unbound/dlv.isc.org.key"
> trusted-keys-file: /etc/unbound/keys.d/*.key
> auto-trust-anchor-file: "/var/lib/unbound/root.anchor"
> val-clean-additional: yes
> val-permissive-mode: no
> val-log-level: 2
> include: /etc/unbound/local.d/*.conf
> remote-control:
> control-enable: yes
> control-interface: 127.0.0.1
> control-interface: ::1
> server-key-file: "/etc/unbound/unbound_server.key"
> server-cert-file: "/etc/unbound/unbound_server.pem"
> control-key-file: "/etc/unbound/unbound_control.key"
> control-cert-file: "/etc/unbound/unbound_control.pem"
> include: /etc/unbound/conf.d/*.conf
> -------------------------------------------------------------------
> 
> 
> 
> 
>    -jf