Maintained by: NLnet Labs

[Unbound-users] Resolve failures when using forwarders that do recursion

Florian Riehm
Tue Nov 26 00:42:49 CET 2013


On 11/21/13 12:26, lst_hoe02 at kwsoft.de wrote:
> 
> Zitat von Ilya Bakulin <Ilya_Bakulin at genua.de>:
> 
>> Hi list,
>> consider the following configuration:
>> 1. Unbound is set up to use 2-3 DNS servers as forwarders
>> 2. These servers are doing recursive resolving
>> 3. Most of resolve requests are for several domain names,
>>     but there are certainly requests for other domains too.
>>
>> Now look at this tcpdump log:
>>
>> 15:59:41.599014 IP unbound.host.13453 > forwarder-1.com.domain: 48359+% [1au]
>> MX? some.domain. (41)
>> 15:59:42.246023 IP unbound.host.29253 > forwarder-2.com.domain: 35322+% [1au]
>> MX? some.domain. (41)
>> 15:59:42.277969 IP forwarder-1.com.domain > unbound.host.13453: 48359 2/0/2 MX
>> mx1.some.domain. 5, MX mx2.some.domain. 10 (102)
>> 15:59:42.278009 IP unbound.host > forwarder-1.com: ICMP unbound.host udp port
>> 13453 unreachable, length 36
>> 15:59:43.254411 IP unbound.host.20082 > forwarder-3.com.domain: 18647+% [1au]
>> MX? some.domain. (41)
>> 15:59:43.575827 IP forwarder-2.com.domain > unbound.host.29253: 35322 2/0/2 MX
>> mx2.some.domain. 10, MX mx1.some.domain. 5 (102)
>> 15:59:43.575933 IP unbound.host > forwarder-2.com: ICMP unbound.host udp port
>> 29253 unreachable, length 36
>> 15:59:43.943166 IP forwarder-3.com.domain > unbound.host.20082: 18647 2/0/2 MX
>> mx1.some.domain. 5, MX mx2.some.domain. 10 (102)
>> 15:59:43.943304 IP unbound.host > forwarder-3.com: ICMP unbound.host udp port
>> 20082 unreachable, length 36
>>
>> We see that Unbound tries to resolve "some.domain" using all three upstream
>> forwarders.
>> Every upstream server has to do a recursive resolving because "some.domain" is
>> not in cache.
>>
>> Now let's reorder the dump so that it is grouped by upstream server:
>>
>> 15:59:41.599014 IP unbound.host.13453 > forwarder-1.com.domain: 48359+% [1au]
>> MX? some.domain. (41)
>> 15:59:42.277969 IP forwarder-1.com.domain > unbound.host.13453: 48359 2/0/2 MX
>> mx1.some.domain. 5, MX mx2.some.domain. 10 (102)
>> 15:59:42.278009 IP unbound.host > forwarder-1.com: ICMP unbound.host udp port
>> 13453 unreachable, length 36
>>
>> So the answer came in >600ms, unbound has closed its socket -> system answers
>> with ICMP unreach
>>
>> 15:59:42.246023 IP unbound.host.29253 > forwarder-2.com.domain: 35322+% [1au]
>> MX? some.domain. (41)
>> 15:59:43.575827 IP forwarder-2.com.domain > unbound.host.29253: 35322 2/0/2 MX
>> mx2.some.domain. 10, MX mx1.some.domain. 5 (102)
>> 15:59:43.575933 IP unbound.host > forwarder-2.com: ICMP unbound.host udp port
>> 29253 unreachable, length 36
>>
>> Here answer came in >1s, the same reaction with ICMP unreach
>>
>> 15:59:43.254411 IP unbound.host.20082 > forwarder-3.com.domain: 18647+% [1au]
>> MX? some.domain. (41)
>> 15:59:43.943166 IP forwarder-3.com.domain > unbound.host.20082: 18647 2/0/2 MX
>> mx1.some.domain. 5, MX mx2.some.domain. 10 (102)
>> 15:59:43.943304 IP unbound.host > forwarder-3.com: ICMP unbound.host udp port
>> 20082 unreachable, length 36
>> Upstream answered in >700ms, => ICMP unreach.
>>
>> So, each of upstream servers has done hard job with recursive resolving,
>> Unbound hasn't accepted any of the answers, returned SERVFAIL to the mail server,
>> mail server hasn't sent a mail, the sender is in disaster.
>>
>> Unbound uses an algorithm described at [1] to set timeouts when
>> sending queries. This works well when Unbound is used as a recursive resolver
>> because Internet is a complex wild network full of crappy overloaded DNS servers
>> and one has to take changing conditions and failing servers in account.
>>
>> But when Unbound uses forwarders that in turn should deal with that
>> wild Internet outside, it doesn't forgive its forwarders when they
>> deliver the answer a bit late. It thinks that those servers should be
>> FAST just because most answers come from their cache and Unbound uses its
>> infra-cache to remember this.
>>
>> If we turn the infra-cache off, Unbound will use its standard 376ms timeout and
>> the situation may get even worse.
>>
>> Does it maybe make sense to add a new configuration parameter that allows to set
>> a custom timeout value when using forwarders? In the case of that poor
>> unbound.host
>> we would set that timeout to be ~ 1500ms or something like that.
>> It may be per-server or global value, and should be used only for requests
>> to "upstream" servers.
>>
>> [1] http://www.unbound.net/documentation/info_timeout.html
>>
>> -- 
>> Ilya
> 
> 
> Hello
> 
> If you have Unbound > 1.4.14 you can try
> 
> tcp-upstream: <yes or no>
>               Enable or disable whether the upstream queries use TCP only  for
>               transport.  Default is no.  Useful in tunneling scenarios.
> 
> Not sure if the (short) timeout also apply to tcp and if the tcp connection is
> used for multiple outstanding queries...
> 
> Regards
> 
> Andreas
> 

Hi,

I have been using unbound for a couple of months now and I have also trouble
with the timeouts. I tried the workarounds discussed on the mailinglist, but I
haven't found a proper solution yet.

Actually Andreas' idea sounds good and I tried it, but I was surprised about
the high load produced by unbound:
On my dns server at home tcp-upstream = yes works pretty good because I have a
moderate number of dns request.
On the company firewall the system load increased by 1. That seems to be to
much.

In any case I think disabling udp should only be a workaround. Unbound should
be usable as dns forwarder with udp.

In my opinion the idea of Ilya sounds good. If we had a config option to set
a lower limit for timeouts, it would be easily possible to use unbound with
forwarders and udp.

Regards

Florian