Maintained by: NLnet Labs

[Unbound-users] Unbound 1.4.18 crashes under high query load

Janne Snabb
Wed Oct 24 21:03:04 CEST 2012


Hi,

While trying to debug my previous Unbound crash issue (see the thread
from yesterday), this time on Debian sid using Unbound 1.4.18, I
encountered another OpenSSL related issue.

It looks like a double free of OpenSSL's EVP_PKEY structure (or
alternatively memory corruption). This issue happens also under heavy
query load but it happens much more frequently than my previous issue.
Also this issue comes up even when "num-threads" is 1. Thus it is not
related to OpenSSL thread safety issues.

See the gdb backtrace at the bottom of the message.

Additionally I noticed a minor issue: In validator/val_secalgo.c the
function setup_key_digest() may return failure if a call to
EVP_PKEY_new() returns NULL. In that case the calling function
verify_canonrrset() will try to do EVP_PKEY_free() with the NULL value.
I am unsure if EVP_PKEY_free() has internal safeguards against this
problem (if it does have, it is not documented). I have been able to
avoid touching OpenSSL so far, thus I am not familiar with it.


Any thoughts? This problem is also difficult to debug because a high
query load is required to trigger it. Therefore setting breakpoints and
single-stepping is out of question. In my test setup I am testing by
resolving names from Alexa top 1 million web sites list. I am feeding
that list to a Go program which constantly runs 100 DNS lookups in
parallel and sends those queries to Unbound -- which always eventually
crashes.

Please tell me if I can assist further in resolving this. I like
Unbound, I wish it was just a bit more robust...


Program received signal SIGABRT, Aborted.
0x00007facd2d18475 in *__GI_raise (sig=<optimized out>)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64      ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x00007facd2d18475 in *__GI_raise (sig=<optimized out>)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007facd2d1b6f0 in *__GI_abort () at abort.c:92
#2  0x00007facd2d5232b in __libc_message (do_abort=<optimized out>,
    fmt=<optimized out>) at ../sysdeps/unix/sysv/linux/libc_fatal.c:189
#3  0x00007facd2d5bb76 in malloc_printerr (action=3,
    str=0x7facd2e32768 "double free or corruption (!prev)",
    ptr=<optimized out>) at malloc.c:6283
#4  0x00007facd2d608ac in *__GI___libc_free (mem=<optimized out>)
    at malloc.c:3738
#5  0x00007facd3301f1d in CRYPTO_free (str=0x82c640) at mem.c:397
#6  0x00007facd336c6a8 in engine_free_util (e=e at entry=0x82c640,
    locked=locked at entry=0) at eng_lib.c:136
#7  0x00007facd336d2c2 in engine_unlocked_finish (e=e at entry=0x82c640,
    unlock_for_handlers=unlock_for_handlers at entry=1) at eng_init.c:112
#8  0x00007facd336d401 in ENGINE_finish (e=0x82c640) at eng_init.c:146
#9  0x00007facd335e660 in RSA_free (r=0x7facccd33a50) at rsa_lib.c:238
#10 0x00007facd338854b in EVP_PKEY_free_it (x=x at entry=0x7facccb81550)
    at p_lib.c:415
#11 0x00007facd3388cd8 in EVP_PKEY_free (x=0x7facccb81550) at p_lib.c:405
#12 0x0000000000437fde in verify_canonrrset (buf=0x7faccc611d20, algo=8,
    sigblock=0x116dfd5
"D\243[\270r\262\271\071\362\355\227\364\310ф#\320\341˼\177\233`\325Or\346\003\035\345z\206r\374\207\245\\\367\222\062>Z-l\256\302\316\323c\217\061\177\225J#\377\370\327 at s?\373\030\006b\035Qy@\004,\310\310\060\363+s\232\364Ȼ\ti\273%6$އ_\330\301w\234lf\212+:bquu,zӠ\n\364˾D\032G\001I\245m\bS[p\255\032\270rv1\227",
sigblock_len=128,
    key=0x7facc86cb36a "\001\003\260\304\071[r\365\314\tB\033t@\340\310b
\277nzL]\210\231գ\250\005\244\255\276N\340\365\332j+^잫\035\032=&'\373t
\317\353\231\336\372\220\017!\262&3kB\273Q\232\n\243v\225!Y\261\020\260\371\f\365\225\236\367\210\347\210\332\177\224\217w\266\303\356\303?m\201\234̐y\006\204\256\004\016b\"S|\220\071\323\357\270\316\020\"\331\352L7\376w\035\250o\243\275\237\244\001\345\001\006\001\001\003\b\001\003\303\316WM\230\313\331\025~\rp\322t\270I\312\016\016\355\232\377\305\334̐GIi\006e\\5\313\b\263<M\027\033\001|\243V\364\226\002b\252b\223\315\372\350\261;U\262\034\065\034ߧ"...,
keylen=130,
    reason=0x7fffdf646b90) at validator/val_secalgo.c:541
#13 0x000000000042f990 in dnskey_verify_rrset_sig (region=0x7faccc460e80,
    buf=0x7faccc611d20, ve=0x8637a0, now=1350998202, rrset=0x116dbf8,
    dnskey=0x7fffdf646950, dnskey_idx=0, sig_idx=0, sortree=0x7fffdf646768,
    buf_canon=0x7fffdf6466ec, reason=0x7fffdf646b90)
    at validator/val_sigcrypt.c:1349
#14 0x0000000000439416 in dnskeyset_verify_rrset_sig (env=0x8bfd78,
    ve=0x8637a0, now=1350998202, rrset=0x116dbf8, dnskey=0x7fffdf646950,
    sig_idx=0, sortree=0x7fffdf646768, reason=0x7fffdf646b90)
    at validator/val_sigcrypt.c:607
#15 0x0000000000438e9a in dnskeyset_verify_rrset (env=0x8bfd78, ve=0x8637a0,
    rrset=0x116dbf8, dnskey=0x7fffdf646950, sigalg=0x7facc86cb610 "\b",
    reason=0x7fffdf646b90) at validator/val_sigcrypt.c:504
#16 0x00000000004304de in val_verify_rrset (env=0x8bfd78, ve=0x8637a0,
    rrset=0x116dbf8, keys=0x7fffdf646950, sigalg=0x7facc86cb610 "\b",
    reason=0x7fffdf646b90) at validator/val_utils.c:334
#17 0x0000000000430754 in val_verify_rrset_entry (env=0x8bfd78, ve=0x8637a0,
    rrset=0x116dbf8, kkey=0x7facc86cb240, reason=0x7fffdf646b90)
    at validator/val_utils.c:380
#18 0x0000000000435cc8 in list_is_secure (env=0x8bfd78, ve=0x8637a0,
    list=0x116dad0, num=3, kkey=0x7facc86cb240, reason=0x7fffdf646b90)
    at validator/val_nsec3.c:1351
#19 0x0000000000435e06 in nsec3_prove_nods (env=0x8bfd78, ve=0x8637a0,
    list=0x116dad0, num=3, qinfo=0x116b5c0, kkey=0x7facc86cb240,
    reason=0x7fffdf646b90) at validator/val_nsec3.c:1378
#20 0x00000000004434e4 in ds_response_to_ke (qstate=0x7facc86c7860,
    vq=0x7facc86cb128, id=0, rcode=0, msg=0x116da60, qinfo=0x116b5c0,
    ke=0x7fffdf646c20) at validator/validator.c:2438
#21 0x0000000000443914 in process_ds_response (qstate=0x7facc86c7860,
    vq=0x7facc86cb128, id=0, rcode=0, msg=0x116da60, qinfo=0x116b5c0,
    origin=0x116e3f8) at validator/validator.c:2555
#22 0x000000000044464c in val_inform_super (qstate=0x116b5c0, id=0,
    super=0x7facc86c7860) at validator/validator.c:2895
#23 0x0000000000467178 in mesh_walk_supers (mesh=0x7faccc6014e0,
    mstate=0x116b570) at services/mesh.c:924
#24 0x000000000046768d in mesh_continue (mesh=0x7faccc6014e0,
    mstate=0x116b570, s=module_finished, ev=0x7fffdf646cfc)
    at services/mesh.c:1041
#25 0x0000000000467843 in mesh_run (mesh=0x7faccc6014e0, mstate=0x116b570,
    ev=module_event_moddone, e=0x0) at services/mesh.c:1072
#26 0x000000000046f570 in mesh_report_reply (mesh=0x7faccc6014e0,
e=0x116da40,
    reply=0x7fffdf647000, what=0) at services/mesh.c:488
#27 0x00000000004912a5 in worker_handle_service_reply (c=0x7faccc3f6f40,
    arg=0x116da40, error=0, reply_info=0x7fffdf647000) at
daemon/worker.c:287
#28 0x000000000040b26f in serviced_callbacks (sq=0x7facccb14190, error=0,
    c=0x7faccc3f6f40, rep=0x7fffdf647000) at services/outside_network.c:1511
#29 0x000000000040c315 in serviced_udp_callback (c=0x7faccc3f6f40,
    arg=0x7facccb14190, error=0, rep=0x7fffdf647000)
    at services/outside_network.c:1782
#30 0x00000000004119bb in outnet_udp_cb (c=0x7faccc3f6f40,
arg=0x7faccc154d60,
    error=0, reply_info=0x7fffdf647000) at services/outside_network.c:462
#31 0x0000000000416e79 in comm_point_udp_callback (fd=354, event=2,
    arg=0x7faccc3f6f40) at util/netevent.c:656
#32 0x00007facd3d86ccc in event_base_loop ()
   from /usr/lib/x86_64-linux-gnu/libevent-2.0.so.5
#33 0x00000000004160ed in comm_base_dispatch (b=0x8639d0)
    at util/netevent.c:265
#34 0x0000000000494310 in worker_work (worker=0x8be990) at
daemon/worker.c:1208
#35 0x00000000004a038d in daemon_fork (daemon=0x814010) at
daemon/daemon.c:487
#36 0x0000000000497988 in run_daemon (
    cfgfile=0x4c31b9 "/etc/unbound/unbound.conf", cmdline_verbose=0,
    debug_mode=0) at daemon/unbound.c:660
#37 0x0000000000497b88 in main (argc=0, argv=0x7fffdf647360)
    at daemon/unbound.c:754

-- 
Janne Snabb / EPIPE Communications
snabb at epipe.com - http://epipe.com/