Maintained by: NLnet Labs

[Unbound-users] Unbound 1.4.18 crashes under high query load

W.C.A. Wijngaards
Thu Oct 25 08:56:45 CEST 2012


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Janne,

On 10/24/2012 09:03 PM, Janne Snabb wrote:
> Hi,
> 
> While trying to debug my previous Unbound crash issue (see the
> thread from yesterday), this time on Debian sid using Unbound
> 1.4.18, I encountered another OpenSSL related issue.

This is the same bug with openssl-locking.  The newer openssl requires
the locking, where the previous versions did not need locking for the
functions that unbound uses.  This is causing the double-free and
other failure that you noted.  It is fixed in svn trunk of unbound.

The crash with num-threads 1 is odd, can you reproduce that with the
svn-trunk version?

Best regards,
   Wouter


> It looks like a double free of OpenSSL's EVP_PKEY structure (or 
> alternatively memory corruption). This issue happens also under
> heavy query load but it happens much more frequently than my
> previous issue. Also this issue comes up even when "num-threads" is
> 1. Thus it is not related to OpenSSL thread safety issues.
> 
> See the gdb backtrace at the bottom of the message.
> 
> Additionally I noticed a minor issue: In validator/val_secalgo.c
> the function setup_key_digest() may return failure if a call to 
> EVP_PKEY_new() returns NULL. In that case the calling function 
> verify_canonrrset() will try to do EVP_PKEY_free() with the NULL
> value. I am unsure if EVP_PKEY_free() has internal safeguards
> against this problem (if it does have, it is not documented). I
> have been able to avoid touching OpenSSL so far, thus I am not
> familiar with it.
> 
> 
> Any thoughts? This problem is also difficult to debug because a
> high query load is required to trigger it. Therefore setting
> breakpoints and single-stepping is out of question. In my test
> setup I am testing by resolving names from Alexa top 1 million web
> sites list. I am feeding that list to a Go program which constantly
> runs 100 DNS lookups in parallel and sends those queries to Unbound
> -- which always eventually crashes.
> 
> Please tell me if I can assist further in resolving this. I like 
> Unbound, I wish it was just a bit more robust...
> 
> 
> Program received signal SIGABRT, Aborted. 0x00007facd2d18475 in
> *__GI_raise (sig=<optimized out>) at
> ../nptl/sysdeps/unix/sysv/linux/raise.c:64 64
> ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or
> directory. (gdb) bt #0  0x00007facd2d18475 in *__GI_raise
> (sig=<optimized out>) at
> ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1  0x00007facd2d1b6f0
> in *__GI_abort () at abort.c:92 #2  0x00007facd2d5232b in
> __libc_message (do_abort=<optimized out>, fmt=<optimized out>) at
> ../sysdeps/unix/sysv/linux/libc_fatal.c:189 #3  0x00007facd2d5bb76
> in malloc_printerr (action=3, str=0x7facd2e32768 "double free or
> corruption (!prev)", ptr=<optimized out>) at malloc.c:6283 #4
> 0x00007facd2d608ac in *__GI___libc_free (mem=<optimized out>) at
> malloc.c:3738 #5  0x00007facd3301f1d in CRYPTO_free (str=0x82c640)
> at mem.c:397 #6  0x00007facd336c6a8 in engine_free_util
> (e=e at entry=0x82c640, locked=locked at entry=0) at eng_lib.c:136 #7
> 0x00007facd336d2c2 in engine_unlocked_finish (e=e at entry=0x82c640, 
> unlock_for_handlers=unlock_for_handlers at entry=1) at eng_init.c:112 
> #8  0x00007facd336d401 in ENGINE_finish (e=0x82c640) at
> eng_init.c:146 #9  0x00007facd335e660 in RSA_free
> (r=0x7facccd33a50) at rsa_lib.c:238 #10 0x00007facd338854b in
> EVP_PKEY_free_it (x=x at entry=0x7facccb81550) at p_lib.c:415 #11
> 0x00007facd3388cd8 in EVP_PKEY_free (x=0x7facccb81550) at
> p_lib.c:405 #12 0x0000000000437fde in verify_canonrrset
> (buf=0x7faccc611d20, algo=8, sigblock=0x116dfd5 
> "D\243[\270r\262\271\071\362\355\227\364\310ф#\320\341˼\177\233`\325Or\346\003\035\345z\206r\374\207\245\\\367\222\062>Z-l\256\302\316\323c\217\061\177\225J#\377\370\327 at s?\373\030\006b\035Qy@\004,\310\310\060\363+s\232\364Ȼ\ti\273%6$އ_\330\301w\234lf\212+:bquu,zӠ\n\364˾D\032G\001I\245m\bS[p\255\032\270rv1\227",
>
> 
sigblock_len=128,
> key=0x7facc86cb36a
> "\001\003\260\304\071[r\365\314\tB\033t@\340\310b \277nzL]\210\231գ
> \250\005\244\255\276N\340\365\332j+^잫\035\032=&'\373t 
> \317\353\231\336\372\220\017!\262&3kB\273Q\232\n\243v\225!Y\261\020\260\371\f\365\225\236\367\210\347\210\332\177\224\217w\266\303\356\303?m\201\234̐y\006\204\256\004\016b\"S|\220\071\323\357\270\316\020\"\331\352L7\376w\035\250o\243\275\237\244\001\345\001\006\001\001\003\b\001\003\303\316WM\230\313\331\025~\rp\322t\270I\312\016\016\355\232\377\305\334̐GIi\006e\\5\313\b\263<M\027\033\001|\243V\364\226\002b\252b\223\315\372\350\261;U\262\034\065\034ߧ"...,
>
> 
keylen=130,
> reason=0x7fffdf646b90) at validator/val_secalgo.c:541 #13
> 0x000000000042f990 in dnskey_verify_rrset_sig
> (region=0x7faccc460e80, buf=0x7faccc611d20, ve=0x8637a0,
> now=1350998202, rrset=0x116dbf8, dnskey=0x7fffdf646950,
> dnskey_idx=0, sig_idx=0, sortree=0x7fffdf646768, 
> buf_canon=0x7fffdf6466ec, reason=0x7fffdf646b90) at
> validator/val_sigcrypt.c:1349 #14 0x0000000000439416 in
> dnskeyset_verify_rrset_sig (env=0x8bfd78, ve=0x8637a0,
> now=1350998202, rrset=0x116dbf8, dnskey=0x7fffdf646950, sig_idx=0,
> sortree=0x7fffdf646768, reason=0x7fffdf646b90) at
> validator/val_sigcrypt.c:607 #15 0x0000000000438e9a in
> dnskeyset_verify_rrset (env=0x8bfd78, ve=0x8637a0, rrset=0x116dbf8,
> dnskey=0x7fffdf646950, sigalg=0x7facc86cb610 "\b", 
> reason=0x7fffdf646b90) at validator/val_sigcrypt.c:504 #16
> 0x00000000004304de in val_verify_rrset (env=0x8bfd78, ve=0x8637a0, 
> rrset=0x116dbf8, keys=0x7fffdf646950, sigalg=0x7facc86cb610 "\b", 
> reason=0x7fffdf646b90) at validator/val_utils.c:334 #17
> 0x0000000000430754 in val_verify_rrset_entry (env=0x8bfd78,
> ve=0x8637a0, rrset=0x116dbf8, kkey=0x7facc86cb240,
> reason=0x7fffdf646b90) at validator/val_utils.c:380 #18
> 0x0000000000435cc8 in list_is_secure (env=0x8bfd78, ve=0x8637a0, 
> list=0x116dad0, num=3, kkey=0x7facc86cb240, reason=0x7fffdf646b90) 
> at validator/val_nsec3.c:1351 #19 0x0000000000435e06 in
> nsec3_prove_nods (env=0x8bfd78, ve=0x8637a0, list=0x116dad0, num=3,
> qinfo=0x116b5c0, kkey=0x7facc86cb240, reason=0x7fffdf646b90) at
> validator/val_nsec3.c:1378 #20 0x00000000004434e4 in
> ds_response_to_ke (qstate=0x7facc86c7860, vq=0x7facc86cb128, id=0,
> rcode=0, msg=0x116da60, qinfo=0x116b5c0, ke=0x7fffdf646c20) at
> validator/validator.c:2438 #21 0x0000000000443914 in
> process_ds_response (qstate=0x7facc86c7860, vq=0x7facc86cb128,
> id=0, rcode=0, msg=0x116da60, qinfo=0x116b5c0, origin=0x116e3f8) at
> validator/validator.c:2555 #22 0x000000000044464c in
> val_inform_super (qstate=0x116b5c0, id=0, super=0x7facc86c7860) at
> validator/validator.c:2895 #23 0x0000000000467178 in
> mesh_walk_supers (mesh=0x7faccc6014e0, mstate=0x116b570) at
> services/mesh.c:924 #24 0x000000000046768d in mesh_continue
> (mesh=0x7faccc6014e0, mstate=0x116b570, s=module_finished,
> ev=0x7fffdf646cfc) at services/mesh.c:1041 #25 0x0000000000467843
> in mesh_run (mesh=0x7faccc6014e0, mstate=0x116b570, 
> ev=module_event_moddone, e=0x0) at services/mesh.c:1072 #26
> 0x000000000046f570 in mesh_report_reply (mesh=0x7faccc6014e0, 
> e=0x116da40, reply=0x7fffdf647000, what=0) at services/mesh.c:488 
> #27 0x00000000004912a5 in worker_handle_service_reply
> (c=0x7faccc3f6f40, arg=0x116da40, error=0,
> reply_info=0x7fffdf647000) at daemon/worker.c:287 #28
> 0x000000000040b26f in serviced_callbacks (sq=0x7facccb14190,
> error=0, c=0x7faccc3f6f40, rep=0x7fffdf647000) at
> services/outside_network.c:1511 #29 0x000000000040c315 in
> serviced_udp_callback (c=0x7faccc3f6f40, arg=0x7facccb14190,
> error=0, rep=0x7fffdf647000) at services/outside_network.c:1782 #30
> 0x00000000004119bb in outnet_udp_cb (c=0x7faccc3f6f40, 
> arg=0x7faccc154d60, error=0, reply_info=0x7fffdf647000) at
> services/outside_network.c:462 #31 0x0000000000416e79 in
> comm_point_udp_callback (fd=354, event=2, arg=0x7faccc3f6f40) at
> util/netevent.c:656 #32 0x00007facd3d86ccc in event_base_loop () 
> from /usr/lib/x86_64-linux-gnu/libevent-2.0.so.5 #33
> 0x00000000004160ed in comm_base_dispatch (b=0x8639d0) at
> util/netevent.c:265 #34 0x0000000000494310 in worker_work
> (worker=0x8be990) at daemon/worker.c:1208 #35 0x00000000004a038d in
> daemon_fork (daemon=0x814010) at daemon/daemon.c:487 #36
> 0x0000000000497988 in run_daemon ( cfgfile=0x4c31b9
> "/etc/unbound/unbound.conf", cmdline_verbose=0, debug_mode=0) at
> daemon/unbound.c:660 #37 0x0000000000497b88 in main (argc=0,
> argv=0x7fffdf647360) at daemon/unbound.c:754
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iQIcBAEBAgAGBQJQiOKtAAoJEJ9vHC1+BF+Nwn8P/223/6e2Ng2n0PWhz8DF9mDg
mgsqDTXx5YTmnls7gtFHkCyaAEWj6wPHgKlCXLMPVqUwhiEHWik53Nb62LpjgVgE
HyeK7pahyUdo/okxnRW94DZsZ0f2IAPEvkhYmjmtrfWTy4h02yuEk5sudNs9vrln
Ep37zUvs1CgPiVscxgtHdfWKfzwJB3apr6pzqSvesFXGyyrQMx7+EXSeRKwtBlht
jCsE1F724V6QIaNvx4/Ja2xe6YLIYRsaKhtOypo7oHZUYQXFP3lLJfyBEXeiXJl9
7xRkDd+i0GCbPFcILs6LQgxXRhFoM4dr9zGrN2RnnAwMWPe/1lnaCOnjkc2O86cb
7C2BYtvH9NGAXrBRT0v0On8IFbsELlp4tOYozej4CGwaZemn4nN6miEZJen1yGn+
r3x8aeI3Mfjj7IgbaermU7ZKZDMUHGqQYQ2wHQZJ/45sKfAEhSoKFGNlKZO/zmav
j47UXGBcY6tF9D1K5lLH3rHDFyvLdeTNkZmoLAsJY02XFuDwoKBdR0iKWOMfIi6o
5iZylP9hlbLIkWOz8heqZJ3ttfdVgbslDezlhHBwtT1yuQJE1UVyD4MyDYPpLsuK
dKxPHpRU4pEwbh8t/VfRMlMiBCUv3dPKM+WSfd6cc3+ceets2JOOr0TfjL0BkApi
6Pc6dam00dZ83XYww2Ad
=bvQs
-----END PGP SIGNATURE-----