Maintained by: NLnet Labs

[Unbound-users] Problem to resolve domains from a certain registrar

Leo Bush
Thu Sep 29 16:25:56 CEST 2011


Dear all,

Thank you Wouter for you last answer. This answer pushed me to get into 
contact with the particular operator, but we did not find a new hint 
until I found the following explanation for the problem today:
- Since weeks, our unbound resolving server gets every minute a request 
for A www.coolbox.be from a device in our network.
- unbound tries to get the answer from ns1.register.be or 
ns2.register.be -> in both cases: no answer -> timeout -> rto climbs 
quickly to 120000
- in parallel our unbound server gets various requests for domains 
hosted at ns1.register.be or ns2.register.be. Normally they all get 
answered quickly. But we notice, that once the rto is arrived at 120000 
(because of www.coolbox.be), unbound does not try to contact the remote 
authoritative servers any more and only returns SERVFAILs even though an 
answer would be available. The whole registrar's nameserver farm is 
blacklisted because one zone is not working any more.
- This explains, why I noticed the error only for ns1.register.be and 
ns2.register.be and not on ns3.register.be, because coolbox.be is not 
delegated on the third server.
- This explains why I noticed that in rare moments, the resolution works 
correctly and why it does not work most of the time. (it works in the 
short periods when the nameservers are not yet blacklisted again)
- I would say unbound does a bad negative caching for two nameservers 
that only respond when they have something to respond. If they are asked 
things they do not know any more, they do not answer (no REJECT). So 
this penalizes the whole communication between the (unbound) resolvers 
and the authoritative server.
- I found the following text  in RFC2308

7 - Other Negative Responses
... not covered by any existing RFC.

7.1 Server Failure (OPTIONAL)
...
a resolver MAY cache a server failure response.  If it
    does so it MUST NOT cache it for longer than five (5) minutes, and it
    MUST be cached against the specific query tuple<query name, type,
    class, server IP address>

7.2 Dead / Unreachable Server (OPTIONAL)

    Dead / Unreachable servers are servers that fail to respond in any
    way to a query or where the transport layer has provided an
    indication that the server does not exist or is unreachable.  A
    server may be deemed to be dead or unreachable if it has not
    responded to an outstanding query within 120 seconds.

    Examples of transport layer indications are:

       ICMP error messages indicating host, net or port unreachable.
       TCP resets
       IP stack error messages providing similar indications to those above.

    A server MAY cache a dead server indication.  If it does so it MUST
    NOT be deemed dead for longer than five (5) minutes.  The indication
    MUST be stored against query tuple<query name, type, class, server
    IP address>  unless there was a transport layer indication that the
    server does not exist, in which case it applies to all queries to
    that specific IP address.

- Can you tell me if my interpretation is correct: requests which do not 
get answered, make unbound blacklist the whole server so that it does 
not even request correct domains which would get answered). Does unbound 
do a caching over the complete tuple <query name, type, class, server IP 
address>?

kind regards


Leo Bush


On 08/09/2011 20:13, W.C.A. Wijngaards wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi Leo,
>
> I do not have a solution for you, but wanted to help read the output.
> The rto value to 120000 means timeouts.  This means that the host is
> timing out, and it does not reply to you.
>
> rto: roundtrip-time-out value.  The roundtrip value modified by
> exponential backoff due to timeouts.  The 'ping' time would be the
> pingtime when it does respond to you (msec).
>
> The leonidas.be servers seem to have blacklisted you?  Or some firewall
> or other script is throttling traffic to zero for you?  So, it works for
> a bit, then it blacklists you, and it stops working and timeouts.
>
> Best regards, Wouter
>