Maintained by: NLnet Labs

[Unbound-users] Problem to resolve domains from a certain registrar

W.C.A. Wijngaards
Thu Sep 8 20:13:36 CEST 2011


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Leo,

I do not have a solution for you, but wanted to help read the output.
The rto value to 120000 means timeouts.  This means that the host is
timing out, and it does not reply to you.

rto: roundtrip-time-out value.  The roundtrip value modified by
exponential backoff due to timeouts.  The 'ping' time would be the
pingtime when it does respond to you (msec).

The leonidas.be servers seem to have blacklisted you?  Or some firewall
or other script is throttling traffic to zero for you?  So, it works for
a bit, then it blacklists you, and it stops working and timeouts.

Best regards, Wouter

On 09/08/2011 06:18 PM, Leo Bush wrote:
> Dear all,
> 
> My problem is not gone yet. I analysed it as far as I could according to
> your key word EDNS and packet loss or fragmentation which looked promising.
> I disabled the iptables rule set, restarted the unbound and let it run
> for a couple of days: same result. Sometimes the domains hosted on
> ns1.register.be and ns2.register.be were resolved. Most of the time
> unbound returned a SERVFAIL.
> Then I disabled DNSSEC and let it run too. Again same result.
> 
> I nevertheless noticed better results for domains like leonidas.be which
> are hosted on ns1.register.be, ns2.register.be, ns3.register.be than for
> domains like estates.lu which is only delegated towards ns1.register.be,
> ns2.register.be.
> 
> Then I read something in the unbound documentation about «Unbound
> Timeout Information» and flush_infra and dump_infra commands. I think
> this time I am on a promising way, but I do not understand enough how it
> works and interacts. I did the following checks:
> 
> [resolv ~]# unbound-control lookup leonidas.be
> The following name servers are used for lookup of leonidas.be.
> ;rrset 85727 2 0 2 0
> leonidas.be.    85727   IN      NS      ns1.register.be.
> leonidas.be.    85727   IN      NS      ns2.register.be.
> ;rrset 37 1 0 8 3
> ns2.register.be.        37      IN      A       194.78.23.152
> ;rrset 37 1 0 8 3
> ns1.register.be.        37      IN      A       80.169.63.207
> Delegation with 2 names, of which 0 can be examined to query further
> addresses.
> It provides 2 IP addresses.
> 80.169.63.207           rto 120000 msec, ttl 412, ping 0 var 94 rtt 376,
> EDNS 0 assumed.
> 194.78.23.152           rto 120000 msec, ttl 478, ping 0 var 94 rtt 376,
> EDNS 0 assumed.
> 
> 
> [root at resolv ~]# unbound-control flush_infra 80.169.63.207 &&
> unbound-control flush_infra 194.78.23.152; while [ 1 ] ; do date;
> unbound-control dump_infra | grep -E
> "80.169.63.207|194.78.23.152|91.121.5.186"; sleep 30; echo;
> done                                    ok
> ok
> Thu Sep  8 18:02:51 CEST 2011
> 91.121.5.186 ttl 717 ping 1 var 9 rtt 50 rto 50 ednsknown 1 edns 0 delay 0
> 
> Thu Sep  8 18:03:21 CEST 2011
> 80.169.63.207 ttl 875 ping 2 var 60 rtt 242 rto 242 ednsknown 1 edns 0
> delay 0
> 194.78.23.152 ttl 889 ping 5 var 16 rtt 69 rto 69 ednsknown 1 edns 0
> delay 0
> 91.121.5.186 ttl 687 ping 1 var 9 rtt 50 rto 50 ednsknown 1 edns 0 delay 0
> 
> Thu Sep  8 18:03:52 CEST 2011
> 80.169.63.207 ttl 844 ping 2 var 60 rtt 242 rto 242 ednsknown 1 edns 0
> delay 0
> 194.78.23.152 ttl 858 ping 5 var 16 rtt 69 rto 69 ednsknown 1 edns 0
> delay 0
> 91.121.5.186 ttl 656 ping 1 var 9 rtt 50 rto 50 ednsknown 1 edns 0 delay 0
> 
> Thu Sep  8 18:04:23 CEST 2011
> 80.169.63.207 ttl 813 ping 2 var 60 rtt 242 rto 3872 ednsknown 1 edns 0
> delay 0
> 194.78.23.152 ttl 827 ping 5 var 16 rtt 69 rto 4416 ednsknown 1 edns 0
> delay 0
> 91.121.5.186 ttl 625 ping 1 var 9 rtt 50 rto 50 ednsknown 1 edns 0 delay 0
> 
> Thu Sep  8 18:04:53 CEST 2011
> 80.169.63.207 ttl 783 ping 2 var 60 rtt 242 rto 15488 ednsknown 1 edns 0
> delay 9
> 194.78.23.152 ttl 797 ping 5 var 16 rtt 69 rto 17664 ednsknown 1 edns 0
> delay 13
> 91.121.5.186 ttl 595 ping 1 var 9 rtt 50 rto 50 ednsknown 1 edns 0 delay 0
> 
> Thu Sep  8 18:05:24 CEST 2011
> 80.169.63.207 ttl 752 ping 2 var 60 rtt 242 rto 30976 ednsknown 1 edns 0
> delay 13
> 194.78.23.152 ttl 766 ping 5 var 16 rtt 69 rto 35328 ednsknown 1 edns 0
> delay 0
> 91.121.5.186 ttl 564 ping 1 var 9 rtt 50 rto 50 ednsknown 1 edns 0 delay 0
> 
> Thu Sep  8 18:05:55 CEST 2011
> 80.169.63.207 ttl 721 ping 2 var 60 rtt 242 rto 61952 ednsknown 1 edns 0
> delay 54
> 194.78.23.152 ttl 735 ping 5 var 16 rtt 69 rto 35328 ednsknown 1 edns 0
> delay 8
> 91.121.5.186 ttl 533 ping 1 var 9 rtt 50 rto 50 ednsknown 1 edns 0 delay 0
> 
> Thu Sep  8 18:06:25 CEST 2011
> 80.169.63.207 ttl 691 ping 2 var 60 rtt 242 rto 61952 ednsknown 1 edns 0
> delay 24
> 194.78.23.152 ttl 705 ping 5 var 16 rtt 69 rto 70656 ednsknown 1 edns 0
> delay 53
> 91.121.5.186 ttl 503 ping 1 var 9 rtt 50 rto 50 ednsknown 1 edns 0 delay 0
> 
> Thu Sep  8 18:06:56 CEST 2011
> 80.169.63.207 ttl 660 ping 2 var 60 rtt 242 rto 120000 ednsknown 1 edns
> 0 delay 0
> 194.78.23.152 ttl 674 ping 5 var 16 rtt 69 rto 70656 ednsknown 1 edns 0
> delay 22
> 91.121.5.186 ttl 472 ping 1 var 9 rtt 50 rto 50 ednsknown 1 edns 0 delay 0
> 
> Thu Sep  8 18:07:26 CEST 2011
> 80.169.63.207 ttl 630 ping 2 var 60 rtt 242 rto 120000 ednsknown 1 edns
> 0 delay 0
> 194.78.23.152 ttl 644 ping 5 var 16 rtt 69 rto 120000 ednsknown 1 edns 0
> delay 0
> 91.121.5.186 ttl 442 ping 4 var 6 rtt 50 rto 50 ednsknown 1 edns 0 delay 0
> ^C
> 
> I noticed that the rto value increases very quickly towards 120000 for 2
> name servers ns1.register.be, ns2.register.b and it stays there.
> Resolutions work for a certain time. Afterwards unbound returns SERVFAIL.
> 
> I did the same test with another twin unbound-nameserver (off traffic),
> and could not notice the same thing.
> 
> unbound-control flush_infra 80.169.63.207 && unbound-control flush_infra
> 194.78.23.152; while [ 1 ] ; do date; unbound-control dump_infra | grep
> -E "80.169.63.207|194.78.23.152|91.121.5.186"; sleep 30; echo; done
> ok
> ok
> Thu Sep  8 17:55:56 CEST 2011
> 
> Thu Sep  8 17:56:26 CEST 2011
> 80.169.63.207 ttl 877 ping 9 var 16 rtt 73 rto 73 ednsknown 1 edns 0
> delay 0
> 91.121.5.186 ttl 877 ping 0 var 20 rtt 80 rto 80 ednsknown 1 edns 0 delay 0
> 194.78.23.152 ttl 877 ping 0 var 13 rtt 52 rto 52 ednsknown 1 edns 0
> delay 0
> 
> Thu Sep  8 17:56:57 CEST 2011
> 80.169.63.207 ttl 846 ping 9 var 16 rtt 73 rto 73 ednsknown 1 edns 0
> delay 0
> 91.121.5.186 ttl 846 ping 0 var 20 rtt 80 rto 80 ednsknown 1 edns 0 delay 0
> 194.78.23.152 ttl 846 ping 0 var 13 rtt 52 rto 52 ednsknown 1 edns 0
> delay 0
> 
> Thu Sep  8 17:57:27 CEST 2011
> 80.169.63.207 ttl 816 ping 9 var 16 rtt 73 rto 73 ednsknown 1 edns 0
> delay 0
> 91.121.5.186 ttl 816 ping 0 var 20 rtt 80 rto 80 ednsknown 1 edns 0 delay 0
> 194.78.23.152 ttl 816 ping 0 var 13 rtt 52 rto 52 ednsknown 1 edns 0
> delay 0
> 
> Thu Sep  8 17:57:57 CEST 2011
> 80.169.63.207 ttl 786 ping 9 var 16 rtt 73 rto 73 ednsknown 1 edns 0
> delay 0
> 91.121.5.186 ttl 786 ping 0 var 20 rtt 80 rto 80 ednsknown 1 edns 0 delay 0
> 194.78.23.152 ttl 786 ping 0 var 13 rtt 52 rto 52 ednsknown 1 edns 0
> delay 0
> 
> Thu Sep  8 17:58:27 CEST 2011
> 80.169.63.207 ttl 756 ping 9 var 16 rtt 73 rto 73 ednsknown 1 edns 0
> delay 0
> 91.121.5.186 ttl 756 ping 0 var 20 rtt 80 rto 80 ednsknown 1 edns 0 delay 0
> 194.78.23.152 ttl 756 ping 0 var 13 rtt 52 rto 52 ednsknown 1 edns 0
> delay 0
> 
> Now I did the following test on the "buggy" unbound server:
> [root at resolv ~]# unbound-control flush_infra 80.92.67.140 &&
> unbound-control flush_infra 80.92.65.2; while [ 1 ] ; do date;
> unbound-control dump_infra | grep -E "80.92.67.140|80.92.65.2"; sleep
> 30; echo; done                                                         ok
> ok
> Thu Sep  8 17:58:38 CEST 2011
> 
> Thu Sep  8 17:59:08 CEST 2011
> 80.92.67.140 ttl 870 ping 21 var 9 rtt 57 rto 57 ednsknown 1 edns 0 delay 0
> 80.92.65.2 ttl 872 ping 9 var 8 rtt 50 rto 50 ednsknown 1 edns 0 delay 0
> 
> Thu Sep  8 17:59:39 CEST 2011
> 80.92.67.140 ttl 839 ping 20 var 8 rtt 52 rto 52 ednsknown 1 edns 0 delay 0
> 80.92.65.2 ttl 841 ping 11 var 10 rtt 51 rto 51 ednsknown 1 edns 0 delay 0
> 
> Thu Sep  8 18:00:09 CEST 2011
> 80.92.67.140 ttl 808 ping 23 var 11 rtt 67 rto 67 ednsknown 1 edns 0
> delay 0
> 80.92.65.2 ttl 810 ping 26 var 29 rtt 142 rto 142 ednsknown 1 edns 0
> delay 0
> 
> Thu Sep  8 18:00:41 CEST 2011
> 80.92.67.140 ttl 777 ping 21 var 9 rtt 57 rto 57 ednsknown 1 edns 0 delay 0
> 80.92.65.2 ttl 779 ping 8 var 8 rtt 50 rto 50 ednsknown 1 edns 0 delay 0
> 
> Thu Sep  8 18:01:11 CEST 2011
> 80.92.67.140 ttl 747 ping 18 var 6 rtt 50 rto 50 ednsknown 1 edns 0 delay 0
> 80.92.65.2 ttl 749 ping 9 var 7 rtt 50 rto 50 ednsknown 1 edns 0 delay 0
> 
> Thu Sep  8 18:01:42 CEST 2011
> 80.92.67.140 ttl 716 ping 18 var 5 rtt 50 rto 50 ednsknown 1 edns 0 delay 0
> 80.92.65.2 ttl 718 ping 8 var 6 rtt 50 rto 50 ednsknown 1 edns 0 delay 0
> 
> Thu Sep  8 18:02:12 CEST 2011
> 80.92.67.140 ttl 685 ping 23 var 13 rtt 75 rto 75 ednsknown 1 edns 0
> delay 0
> 80.92.65.2 ttl 687 ping 11 var 7 rtt 50 rto 50 ednsknown 1 edns 0 delay 0
> ^C
> 
> Does anybody have an explanation or a suggestion for this?
> 
> regards
> 
> 
> Leo Bush
> 
> 
> On 24/08/2011 13:47, Lst_hoe02 at kwsoft.de wrote:
>> Zitat von Leo Bush <leo.bush at mylife.lu>:
>>
>>> Dear all,
>>>
>>> Since one month our company uses unbound-1.4.8-1 on two RH6 servers
> as caching and resolving servers with IPv6 and DNSSec enabled. These two
> servers deal with all our DNS traffic, generated by all our customers
> (2x 5Mbps peak traffic). They work as stand alone servers, no
> complicated network components (Load balancer...) around.
>>>
>>> At the beginning we used to activate the option use-caps-for-id, but
> since we got complaints from customers that certain domains were
> available everywhere in the world except at us, we preferred to deactivate.
>>>
>>> Currently we face the following rather strange problem:
>>> Under normal working conditions, in 70-90% of the time our two
> production servers  cannot  resolve domains registered at register.be
> and lying on the three authoritative name servers ns1.register.be,
> ns3.register.be, ns2.register.be (example: leonidas.be, estates.lu).
> They return me a SERVFAIL. register.be itself works all the time. By
> chance it sometimes works correctly for a brief period of time. Even
> though it was not easy due to the thousands of packets passing through
> in a second, I succeeded to trace the packets the server sends to the
> authoritative servers and it gets correct answers back.
>>>
>>> I tried to install unbound 1.4.8 with the same configuration file
> (see attachment) on a desktop machine and there was no issue. All
> resolutions against domains at register.be were immediate and correct.
>>>
>>> As customers continued to complain I was forced to take one server
> out of production and to replace it with bind which works correctly. Now
> I have one server with unbound that has the problem and one server with
> bind, that works fine in production. The formerly faulty unbound server
> that is now offloaded currently responds correctly at all tests (no
> restart done, no reboot done, just IP address switched).
>>>
>>> Does anybody have an idea how I can solve this problem? Shall I offer
> you more technical information? Do you have further tests to suggest?
>>>
>>
>> Looks for me like EDNS problem. At least some part of the .be zone is
> DNSSEC signed an the replies get bigger than 512 Byte like with "dig
> x.dns.be A +dnssec". Bind has a feature to reduce the EDNS size in case
> of trouble, not sure if Unbound does the same. What you should check:
>> - Do the trouble domain/names resolve with unbound if you use checking
> disabled (+cdflag)
>> - Do you have any firewall device in front of your resolvers maybe
> some Cisco inspecting DNS traffic
>> - Do you have disabled Unbound tcp
>>
>> For some hints on the problem have a look here:
>> https://www.dns-oarc.net/oarc/services/replysizetest
>>
>> Regards
>>
>> Andreas
>>
>>
>>
>> _______________________________________________
>> Unbound-users mailing list
>> Unbound-users at unbound.net
>> http://unbound.nlnetlabs.nl/mailman/listinfo/unbound-users
>>
> a
> _______________________________________________
> Unbound-users mailing list
> Unbound-users at unbound.net
> http://unbound.nlnetlabs.nl/mailman/listinfo/unbound-users

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.15 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org/

iQIcBAEBAgAGBQJOaQXQAAoJEJ9vHC1+BF+NgkcQAJV/fZiAtT23tu5r54QZBM/k
IJb6ZGXvlH4BgvnEd6nVgkyDfQsJld9fbLytJw09qF9iZnU79F2er1Ih5+Knb8Jh
cfI8O4lLAdRx0VAArljZIHNuC8Z7Vzbjz/nZCOHEgXLvZ1Gejc2RzhXJ0cxXinLG
uZdFuhq/NOet9iFPotELNmmcUXL4gNYmvAw8zshyZOyMpeh4DGKH51ZUoqba4u1Z
PKcu+XsWd8AE3EyCXf1+Lx5aqXAYBf/24/trx/F2vT81ik8dawsvlRnquXpFR1M1
wuFuiiyNeOOQqsG4YZiwBnyHDE41WnQcDy/H2NIglfzpXIlAu8RUIpw7fFkLU4mH
ALeyPHhDaiIi60wKmul/SqEHsn7ct0pXPsysVPP21Mm2FpLGgShD9a/igbBzaIQB
un/tWMZiVkvgtlzQy/nDBDqoUoCxJtWLP63YH7rQJi8zxL8Q4PJTTIREdxGafRuB
6S/utt0dUeACwx0toYA6GzEa7l7pwblMj5SMzJytYSc1ckPOmHKE37/HM/91FA2b
9tIrKMM7MsqNuK7aTJWqb4A5Okxj3QctiLzjn7IIiCtnRH5L8F4z4evp5SnXBkNn
kOzuMXPJ5iUfTAQ81YeReVp0cmMJwNGNr7iHkcC4pw+dWt8NVFqrcpJ0LQ7Ak/R9
nx7mQFV/oK3p0ZPW3FlQ
=eQ2M
-----END PGP SIGNATURE-----