Maintained by: NLnet Labs

[Unbound-users] Improve avg response times

vinay3 at justemail.net
Sat Jul 7 08:24:27 CEST 2012


I am using an amazon large EC2 instance (4ECUs, 2 cores) for my unbound
configured as below. I am seeing a 150ms+ average response time as reported
by namebench Alexa 2K result. In order to reduce my lookup times, I am
running an hourly scan of these 35K sites (from namebench dat files) in
order to give my clients a cached response whenever possible. On average, my
cachemiss rate is 6% as shown below. My cache-ttl-min is 1 hour so these
entries should be cached at all times. The cachemisses I am guessing are
from sites my pythonmod looks up and responds to in a special way:

 

6.5Mbytes of free RAM

 

total.num.cachehits=3185

total.num.cachemiss=188

mem.cache.rrset=8319405

mem.cache.message=8729827

 

(forked configuration)

server:

        #disable chroot as it caused several issues with python's PYTHONHOME
vars

        chroot: ""

        verbosity: 0

        # set to num of cores or cpus

        num-threads: 2

        ##slabs 

        rrset-cache-slabs: 1

        infra-cache-slabs: 1

        key-cache-slabs: 1

        msg-cache-slabs: 1

        ##cache sizes

        msg-cache-size: 250m

        #2X msg-cache-size

        rrset-cache-size: 500m

        outgoing-range: 950

        #2X outgoing range

        num-queries-per-thread: 512

        # sudo sysctl -w net.core.rmem_max=8388608

        so-rcvbuf: 8m

        interface: 0.0.0.0

        interface: ::0

        port: 53

        access-control: 0.0.0.0/0 allow

        module-config: "python iterator"

        prefetch: yes

        cache-min-ttl: 3600

 

python:

        python-script: "XYZ"

 

remote-control:

        control-enable: yes

 

forward-zone:

        name: "."

        forward-addr: XYZ

 

Question:

 

Even with this setup, I am seeing most of the domains return a TTL of 3600
at the start of a random namebench which means they were iterated/recursed
over instead of looked up from cache. This is causing a 150ms+ average
response times for these 35K sites. It's the exact same 35K sites being
scanned by namebench - why aren't these looked up from the cache instead of
being iterated over? Are these sites not cached for a full 3600 seconds? 

 

With prefetch, cache-min-ttl of 1hour, why isn't an hourly scan of these 35K
sites populating my cache and giving me a <50ms response time on average?

 

With the same setup, if I take 500 sites and run namebench back to back for
these fixed 500 sites, my average response time starts approaching 40-50ms
which is where I am trying to be with the 35K sites. 

 

Where am I going wrong and how can debug and fix this issue?

 

Vinay.

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unbound.nlnetlabs.nl/pipermail/unbound-users/attachments/20120707/705c9789/attachment-0001.html>