Hi, On 14 Nov 2008, at 04:53, W.C.A. Wijngaards wrote: > Looked at zen.spamhaus.org; but I could not see how harden-glue would > make an impact. Or how A records could time out while the AAAA do not. > They have the same timeout value (4 hours). I think it is possible to > change the code to deal with this situation, but I would like to know > how it happened. We've been running into the same problem - running Version 1.0.2 with libevent 1.4.8-stable, libldns 1.3.0. Something common between the two records (zen.spamhaus.org and www.askmeofflistfortthename.net ) is the fact that both of them got both an A and AAAA record. I perform a dig against our resolvers once a minute for 3 records, and only the www. askmeofflistfortthename.net record seems to be failing regularly - at least twice a day. I fix this by *drummsroll* restarting Unbound, which is not great, but I'd have to do the same with Bind9, so... ;-) > The 30 minutes sounds close to the 15 minute (900 second) default > timeout on lameness detections. Various kinds of badness are detected > and stored in the infrastructure cache. Again, with 22 servers, it > seems unlikely all their A records are lame. I run unbound with 4 theads, and I've noticed that only 1 thread seems to be returning the SERVFAIL. Do you maintain an infrastructure cache per thread? > Do you have more log information (you can send this off list) you can > share with me? Lots of data before this point, that deals with > *.spamhaus.org. If you gzip it, 100M of log is only 1 meg email. What level of verbosity should I set to provide meaningful information that you can use to debug the issue? > If it happens again can you query with dig +norec a.ns.spamhaus.org ? > And dig +norec +cdflag +dnssec a.ns.spamhaus.org ? I'll do this for our record as well, and mail you with my findings. Cheers, Jaco