Maintained by: NLnet Labs

[Unbound-users] Improve avg response times

vinay3 at justemail.net
Sat Jul 7 20:55:28 CEST 2012


I tried using a prefetch trigger at 90% instead of 10% (intentionally
inefficient) and a cache-min-ttl of 5400 so that an hourly scan is
guaranteed to find a cached entry from the last scan and will also reset
it's TTL back to 5400 by forcing a new iteration.

 

If I do a new namebench run, I still get a 150ms+ avg response time and I
see several responses that have a TTL of 5400 meaning they were cachemisses.


 


blog.sina.com.cn.

A

814.8959

5400

2

blogx.sina.com.cn. -> 218.30.115.254

	
	


www.utorrent.com.

A

203.908

5400

1

67.215.233.130

			
	


in.youtube.com.

A

29.46731

2982

2

youtube-ui.l.google.com. -> 74.125.224.46, 74.125.224.32, 74.125.224.33,
74.125.224.34, 74.125.224.35, 74.125.224.36, 74.125.224.37, 74.125.224.38,
74.125.224.39, 74.125.224.40, 74.125.224.41

	


search.mywebsearch.com.

A

72.6796

2985

2

www154.mywebsearch.com. -> 74.113.233.48

	


wenwen.soso.com.

A

296.7475

5236

1

202.55.10.153

			
	


www.uploading.com.

A

26.0509

3007

1

195.191.207.40

			
	


www.ebay.com.

A

31.4418

5400

1

66.135.200.161, 66.135.200.181, 66.135.210.61, 66.135.210.181,
66.211.181.161, 66.211.181.181

 

So this is appearing like unexpected behavior to me: If www.utorrent.com was
scanned less than an hour ago and was supposed to be cached for 5400
seconds, why would a random scan like the one above find it flushed out of
the cache (5400 TTL) and requiring an iteration (203.9 msecs response time)?

 

From: 1phon3apps at googlegroups.com [mailto:1phon3apps at googlegroups.com] On
Behalf Of vinay3 at justemail.net
Sent: Friday, July 06, 2012 11:24 PM
To: unbound-users at unbound.net
Subject: Improve avg response times

 

I am using an amazon large EC2 instance (4ECUs, 2 cores) for my unbound
configured as below. I am seeing a 150ms+ average response time as reported
by namebench Alexa 2K result. In order to reduce my lookup times, I am
running an hourly scan of these 35K sites (from namebench dat files) in
order to give my clients a cached response whenever possible. On average, my
cachemiss rate is 6% as shown below. My cache-ttl-min is 1 hour so these
entries should be cached at all times. The cachemisses I am guessing are
from sites my pythonmod looks up and responds to in a special way:

 

6.5Mbytes of free RAM

 

total.num.cachehits=3185

total.num.cachemiss=188

mem.cache.rrset=8319405

mem.cache.message=8729827

 

(forked configuration)

server:

        #disable chroot as it caused several issues with python's PYTHONHOME
vars

        chroot: ""

        verbosity: 0

        # set to num of cores or cpus

        num-threads: 2

        ##slabs 

        rrset-cache-slabs: 1

        infra-cache-slabs: 1

        key-cache-slabs: 1

        msg-cache-slabs: 1

        ##cache sizes

        msg-cache-size: 250m

        #2X msg-cache-size

        rrset-cache-size: 500m

        outgoing-range: 950

        #2X outgoing range

        num-queries-per-thread: 512

        # sudo sysctl -w net.core.rmem_max=8388608

        so-rcvbuf: 8m

        interface: 0.0.0.0

        interface: ::0

        port: 53

        access-control: 0.0.0.0/0 allow

        module-config: "python iterator"

        prefetch: yes

        cache-min-ttl: 3600

 

python:

        python-script: "XYZ"

 

remote-control:

        control-enable: yes

 

forward-zone:

        name: "."

        forward-addr: XYZ

 

Question:

 

Even with this setup, I am seeing most of the domains return a TTL of 3600
at the start of a random namebench which means they were iterated/recursed
over instead of looked up from cache. This is causing a 150ms+ average
response times for these 35K sites. It's the exact same 35K sites being
scanned by namebench - why aren't these looked up from the cache instead of
being iterated over? Are these sites not cached for a full 3600 seconds? 

 

With prefetch, cache-min-ttl of 1hour, why isn't an hourly scan of these 35K
sites populating my cache and giving me a <50ms response time on average?

 

With the same setup, if I take 500 sites and run namebench back to back for
these fixed 500 sites, my average response time starts approaching 40-50ms
which is where I am trying to be with the 35K sites. 

 

Where am I going wrong and how can debug and fix this issue?

 

Vinay.

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unbound.nlnetlabs.nl/pipermail/unbound-users/attachments/20120707/55b715d7/attachment-0001.html>