-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, Because we haven't measured multithread performance scaling of unbound before, I decided to try it myself. Also I was bored waiting late at night for an audio broadcast from the IETF :-) The study is below. Unbound multithread performance: an investigation into scaling of cache response qps Using a Solaris 5.11 quadcore machine[*], with four CPUs at 200 Mhz, I have tested unbound cache performance in various configurations. In this test setup the solaris machine is blazing away its four cpus (no hyperthreading), and two other hosts (BSD and linux) at 3Gz are running perf and sending queries for cache responses for www.nlnetlabs.nl at a high rate. We count the number of queries per second that this returns. The various configurations are with the builtin mini-event (select(2) based), and with libevent-1.4.12-stable(using evport). Also pthreads, solaris-threads and no-threaded(fork(2)) operation are used. The unbound config file contains some minimum statements to make it accessible from the network - an access control list and interface statements - and also num-threads, and this is set to 1, 2, 3 and 4. It was observed that the threads all seem to handle about an even load in the tests. So real multi-threading is happening. In this test it is very easy to outperform the machine using the two senders, otherwise this test becomes a lot trickier. Table, qps in total for all threads together. Configuration ------- 1 core --- 2 cores --- 3 cores --- 4 cores select and pthreads 8450 14100 16100 18600 select and solaristhr 8600 13800 15800 17500 select and no threads 10000 17800 19800 22800 evport and pthreads 8400 13600 15900 18100 evport and solaristhr 8500 14100 16000 18600 evport and no threads 9700 17300 19600 22300 The performance scales up fairly neatly as multi-threading goes. For every configuration a slower-than-linear speedup is observed, indicating locks in the underlying operation system network stack. There is only one network card, after all, and the CPUs have to lock and synchronise with it. The solaristhreads are a little faster than pthreads, when combined with evport (a solaris-specific socket management system call). No threads is even faster (but of course fragments the cache), by about 20%, and its advantage increases slightly as the number of cores increases (from 15% to 23%). The evport call is a little bit slower than select, but since it breaks the 1024-limit of select, it will thus remain useful for high capacity configurations. To increase performance further, it seems the place to work at is the network driver or network stack. Best regards, Wouter [*] This machine has been donated by RIPE NCC and has mostly been used for System/OS interoperation testing. It turned out to be a good machine to expose certain race conditions that did not show up on regular Intel/Linux or BSD systems. If you happen to have somewhat exotic machinery around we would welcome your donation. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ iEYEARECAAYFAkuofeoACgkQkDLqNwOhpPiywgCfV9BJDaHYAUtgc/J7ueLCfJF4 d30AoJmpCXcLqc5rnTMWNHeyO3+LdG9w =m8Qo -----END PGP SIGNATURE-----