Maintained by: NLnet Labs

[Unbound-users] ~10% performance increase in worker_handle_request()

W.C.A. Wijngaards
Mon Jan 27 11:34:17 CET 2014


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Robert,

On 01/25/2014 08:23 PM, Robert Edmonds wrote:
> W.C.A. Wijngaards wrote:
>> On 01/06/2014 07:46 PM, Robert Edmonds wrote:
>>> W.C.A. Wijngaards wrote:
>>>> However, moving the check after the cache check is not a
>>>> good idea. The localzone and localdata statements are
>>>> supposed to be able to override the DNS contents from the
>>>> internet.  Checking it earlier means that this always
>>>> happens.  Messing with cache flushes sounds very bad, a query
>>>> may create a recursive query that would then override the
>>>> configuration and so on, lots of complexity as well as
>>>> worries about giving the 'wrong' answers (as compared to the
>>>> configuration).
>>> 
>>> OK, that makes sense.
>>> 
>>>> It would be better to optimise the localzone lookup itself 
>>>> somehow. Not sure what the best way is, it is visible that
>>>> it uses a rbtree now and that this is slower than the
>>>> hashtable that the other cache employs
>>> 
>>> Yes, the problem is not that the localzone lookup occurs first,
>>> per se, but that the localzone lookup causes measurable
>>> contention with other threads.
>> 
>> The lock is changed to a rwlock, so that the threads can all
>> acquire the readlock to answer queries.  That should reduce
>> contention?  The call order is not changed.
> 
> Hi, Wouter:
> 
> I benchmarked svn r3047 (spinlock?) against svn r3048 (rwlock) and
> the difference is not as dramatic.  Maybe 0.5% - 1% speedup.  It
> doesn't appear to harm performance at any level, but it's also not
> as dramatic as eliding the localzone lookup (~10%).

Too bad, I do not know how else to fix this.  Duplicating the
localzone data for every worker is a lot of memory overhead (as well
as code and maintenance overhead whilst manipulating the localzone
data), and I think less desirable.  It does reduce that lookup even
more because there are no locks on it in that case.  ( you can already
compile unbound today without locking and without threading, and run
unbound that uses processes that duplicate the entire cache for every
worker, and then also duplicate the localzone data without locks for
every worker ).

Still, it is good to know this is a performance (or CPU-) heavy-point
at this time.

Best regards,
   Wouter

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJS5jYoAAoJEJ9vHC1+BF+NWLcP/jB/s1ow4kpfWfRrydfIuals
GJFtnTLLwH6+7G0EX4N+L8/uD2YNctOimscGNf88WP+hzS2mjk91SmwEKhIDFIv+
OJz5XuQ1YpVOgUWBUiTCNYlFVp+HTJDWVAXCJS7wXcRuO6ztuemGdziOJj+yNmYt
WhvcwYBaeRsbCi2zTo26ODXTV6YKEofZXePm3wGIMSTXMEFPbvsFSCzUuqEzg1Dx
gFNvRTyqdOw8kCLjjhwTTQd7wUXBdco1De0E/v8Xr9hizHjJ+3+JcCrHtgKqxLcE
zCiWZJCH+ioQBI/nQ2yqpS/uQPWQ8b4zfTIr3uShI2LWV6cpNEaQKKUaxIqE2/qT
Dwl0HxVvwgTG/FMjUyHt4iO3TpOmFQ7Gz50BhVgkO/EnpGkgwIdToCnlRjC/5Xc1
R8M3r2bMNLnZinA6e3wvtPgMq2aatO72jpxsCWjpC/AXH14J2FZ55UcNL/VJmd3n
mkW7a94CeOICPIQpfFP6vO4rigDYBPQNhKOsZN3igD60jHqox5NiswEYA15QZVwr
PLiH5BSZdU5isnAArgJGT3EqzwiRK2aHbm95vVobrowVkil5CIn5IHBLYpWZC+pF
wt+rYtdaYK0FjaW67PF64pigL27US2Kwzik5CRwkMY1QUyNc3VqvIwgTySUCrT72
nDfFIyKqbT6Mth4yklmn
=v+Ng
-----END PGP SIGNATURE-----