Maintained by: NLnet Labs

[Unbound-users] Resolve failures when using forwarders that do recursion

lst_hoe02 at kwsoft.de
Thu Nov 21 12:26:28 CET 2013


Zitat von Ilya Bakulin <Ilya_Bakulin at genua.de>:

> Hi list,
> consider the following configuration:
> 1. Unbound is set up to use 2-3 DNS servers as forwarders
> 2. These servers are doing recursive resolving
> 3. Most of resolve requests are for several domain names,
> 	but there are certainly requests for other domains too.
>
> Now look at this tcpdump log:
>
> 15:59:41.599014 IP unbound.host.13453 > forwarder-1.com.domain:  
> 48359+% [1au] MX? some.domain. (41)
> 15:59:42.246023 IP unbound.host.29253 > forwarder-2.com.domain:  
> 35322+% [1au] MX? some.domain. (41)
> 15:59:42.277969 IP forwarder-1.com.domain > unbound.host.13453:  
> 48359 2/0/2 MX mx1.some.domain. 5, MX mx2.some.domain. 10 (102)
> 15:59:42.278009 IP unbound.host > forwarder-1.com: ICMP unbound.host  
> udp port 13453 unreachable, length 36
> 15:59:43.254411 IP unbound.host.20082 > forwarder-3.com.domain:  
> 18647+% [1au] MX? some.domain. (41)
> 15:59:43.575827 IP forwarder-2.com.domain > unbound.host.29253:  
> 35322 2/0/2 MX mx2.some.domain. 10, MX mx1.some.domain. 5 (102)
> 15:59:43.575933 IP unbound.host > forwarder-2.com: ICMP unbound.host  
> udp port 29253 unreachable, length 36
> 15:59:43.943166 IP forwarder-3.com.domain > unbound.host.20082:  
> 18647 2/0/2 MX mx1.some.domain. 5, MX mx2.some.domain. 10 (102)
> 15:59:43.943304 IP unbound.host > forwarder-3.com: ICMP unbound.host  
> udp port 20082 unreachable, length 36
>
> We see that Unbound tries to resolve "some.domain" using all three  
> upstream forwarders.
> Every upstream server has to do a recursive resolving because  
> "some.domain" is not in cache.
>
> Now let's reorder the dump so that it is grouped by upstream server:
>
> 15:59:41.599014 IP unbound.host.13453 > forwarder-1.com.domain:  
> 48359+% [1au] MX? some.domain. (41)
> 15:59:42.277969 IP forwarder-1.com.domain > unbound.host.13453:  
> 48359 2/0/2 MX mx1.some.domain. 5, MX mx2.some.domain. 10 (102)
> 15:59:42.278009 IP unbound.host > forwarder-1.com: ICMP unbound.host  
> udp port 13453 unreachable, length 36
>
> So the answer came in >600ms, unbound has closed its socket ->  
> system answers with ICMP unreach
>
> 15:59:42.246023 IP unbound.host.29253 > forwarder-2.com.domain:  
> 35322+% [1au] MX? some.domain. (41)
> 15:59:43.575827 IP forwarder-2.com.domain > unbound.host.29253:  
> 35322 2/0/2 MX mx2.some.domain. 10, MX mx1.some.domain. 5 (102)
> 15:59:43.575933 IP unbound.host > forwarder-2.com: ICMP unbound.host  
> udp port 29253 unreachable, length 36
>
> Here answer came in >1s, the same reaction with ICMP unreach
>
> 15:59:43.254411 IP unbound.host.20082 > forwarder-3.com.domain:  
> 18647+% [1au] MX? some.domain. (41)
> 15:59:43.943166 IP forwarder-3.com.domain > unbound.host.20082:  
> 18647 2/0/2 MX mx1.some.domain. 5, MX mx2.some.domain. 10 (102)
> 15:59:43.943304 IP unbound.host > forwarder-3.com: ICMP unbound.host  
> udp port 20082 unreachable, length 36
> Upstream answered in >700ms, => ICMP unreach.
>
> So, each of upstream servers has done hard job with recursive resolving,
> Unbound hasn't accepted any of the answers, returned SERVFAIL to the  
> mail server,
> mail server hasn't sent a mail, the sender is in disaster.
>
> Unbound uses an algorithm described at [1] to set timeouts when
> sending queries. This works well when Unbound is used as a recursive resolver
> because Internet is a complex wild network full of crappy overloaded  
> DNS servers
> and one has to take changing conditions and failing servers in account.
>
> But when Unbound uses forwarders that in turn should deal with that
> wild Internet outside, it doesn't forgive its forwarders when they
> deliver the answer a bit late. It thinks that those servers should be
> FAST just because most answers come from their cache and Unbound uses its
> infra-cache to remember this.
>
> If we turn the infra-cache off, Unbound will use its standard 376ms  
> timeout and
> the situation may get even worse.
>
> Does it maybe make sense to add a new configuration parameter that  
> allows to set
> a custom timeout value when using forwarders? In the case of that  
> poor unbound.host
> we would set that timeout to be ~ 1500ms or something like that.
> It may be per-server or global value, and should be used only for requests
> to "upstream" servers.
>
> [1] http://www.unbound.net/documentation/info_timeout.html
>
> --
> Ilya


Hello

If you have Unbound > 1.4.14 you can try

tcp-upstream: <yes or no>
               Enable or disable whether the upstream queries use TCP only  for
               transport.  Default is no.  Useful in tunneling scenarios.

Not sure if the (short) timeout also apply to tcp and if the tcp  
connection is used for multiple outstanding queries...

Regards

Andreas