Timeout semantics of Unbound differ radically from Bind 9

Sun Apr 10 04:43:33 UTC 2016

On Sun, Apr 10, 2016 at 4:08 AM, Aaron Hopkins <lists at die.net> wrote:
> On Sun, 10 Apr 2016, Dhalgren Tor via Unbound-users wrote:
>
> Under the covers, Tor uses eventdns.  Looking at the eventdns source
> (https://github.com/torproject/tor/blob/master/src/ext/eventdns.c), it
> appears that by default it times out after 5 seconds, and considers the
> nameserver to be down if it gets 3 timeouts in a row.
>
> If it's down, it blocks all new requests (not just for that domain) and
> tries to use the nameserver again after 10, 60, 300, 900, and 3600 seconds.

Thank you for the rapid and insightful analysis.  I was just beginning
to think I should mention the eventdns usage in Tor as it might be
relevant.  Clearly it is.

However please note that Tor project has modified the eventdns code so
it may not match exactly the behavior of the generic version.

>> Unbound would reply to DNS queries with an appropriate SERVFAIL message
>> after ten seconds while continuing with the usual persistent effort to
>> resolve the record and then cache the result if successful.
>
> Answering with something within 15 seconds does seem important for eventdns.
>
> However, eventdns also only allows 64 requests to be in flight at once.  If
> all of those are trying to query domains that are timing out, all other
> requests will just wait.  So it would actually be better for eventdns if
> unbound would answer SERVFAIL immediately if unbound believes all of the
> nameservers for a domain are broken and it won't be retrying soon.

Interesting!  This explains why Tor relay DNS completely seizes up
when GoDaddy null-routes a relay running Unbound.

Now I have to look into whether that 64 in-flight limit might be a
performance constraint for fast exit relays.  Might want a tunable to
increase the limit.

If the Unbound team decides to create an eventdns / Tor daemon
compatibility feature please let me know via this thread.

Regards