Maintained by: NLnet Labs

[Unbound-users] Strange SERVFAIL from unbound

Jaco Engelbrecht
Fri Nov 14 19:44:19 CET 2008


Hi,

On 14 Nov 2008, at 04:53, W.C.A. Wijngaards wrote:
> Looked at zen.spamhaus.org; but I could not see how harden-glue would
> make an impact. Or how A records could time out while the AAAA do not.
> They have the same timeout value (4 hours). I think it is possible to
> change the code to deal with this situation, but I would like to know
> how it happened.

We've been running into the same problem - running Version 1.0.2 with  
libevent 1.4.8-stable, libldns 1.3.0.

Something common between the two records (zen.spamhaus.org and www.askmeofflistfortthename.net 
) is the fact that both of them got both an A and AAAA record.  I  
perform a dig against our resolvers once a minute for 3 records, and  
only the www. askmeofflistfortthename.net record seems to be failing  
regularly - at least twice a day.  I fix this by *drummsroll*  
restarting Unbound, which is not great, but I'd have to do the same  
with Bind9, so... ;-)

> The 30 minutes sounds close to the 15 minute (900 second) default
> timeout on lameness detections.  Various kinds of badness are detected
> and stored in the infrastructure cache.  Again, with 22 servers, it
> seems unlikely all their A records are lame.

I run unbound with 4 theads, and I've noticed that only 1 thread seems  
to be returning the SERVFAIL.

Do you maintain an infrastructure cache per thread?

> Do you have more log information (you can send this off list) you can
> share with me?  Lots of data before this point, that deals with
> *.spamhaus.org.  If you gzip it, 100M of log is only 1 meg email.

What level of verbosity should I set to provide meaningful information  
that you can use to debug the issue?

> If it happens again can you query with dig +norec a.ns.spamhaus.org ?
> And dig +norec +cdflag +dnssec a.ns.spamhaus.org ?

I'll do this for our record as well, and mail you with my findings.

Cheers,
Jaco