Maintained by: NLnet Labs

[Unbound-users] Validation failure of DNSSEC signed domain names

Ondřej Surý
Mon May 10 15:48:43 CEST 2010


Wouter,

it is still worrying me that "optimizing" configuration could be used
to circumvent SERVFAILs. It still seems to me that something deeper is
still involved, because it causes only signed domains to fail.
Unsigned domains and domains not in cache are still ok.

Could you please look at this bug further? Since we are able to
repeatedly reproduce this bug, it should not be impossible to trace it
further down.

Ondrej

On Wed, May 5, 2010 at 14:58, Zbynek Michl <zbynek.michl at nic.cz> wrote:
> Hmm, so I used optimalization according to documentation
> <http://www.unbound.net/documentation/howto_optimise.html> and it seems that
> Unbound works fine right now :) I am going to do some more tests.
>
> Thanks,
> Zbynek
>
> On 5.5.2010 13:51, Zbynek Michl wrote:
>>
>> Hi Wouter,
>>
>> On 4.5.2010 10:48, W.C.A. Wijngaards wrote:
>>>
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>>
>>> Hi Zbynek,
>>>
>>> Can you try with the svn trunk r2107? I have made a bugfix which may
>>> help, it stops unbound from disabling its dnssec expectation after it
>>> has seen a response fail without rrsigs. This could turn into a long
>>> causation chain: dnssec expectation lost, therefore EDNS backoff
>>> possible (with timeouts happening), therefore EDNS backoff stored in
>>> cache, stopping it from sending DO bits (and a bug fixed in r2106 where
>>> this fact got 'stuck' into the cache).
>>
>> Still the same :( The only difference is when "strange" state occurs
>> then replies for DNSSEC queries which I never tried before are OK, but
>> those I have already been tried are still SERVFAILed.
>>
>>> After discussion with my colleague I think this fix may help... The
>>> evidence you kindly provided supports the hypothesis and this fix should
>>> stop it from failing continuously (but a single query still fails).
>>>
>>> Also, I believe there to be some trouble with your setup, which is
>>> causing unbound to have slower turnaround; can you email me (offlist if
>>> you want) your configuration (unbound.conf and what is your ulimit(open
>>> files) for unbound, did you compile with libevent) ? I think this
>>> slower turnaround exists because you have that failing query.
>>
>> Yes, Unbound is compiled with libevent.
>>
>> # ulimit -n
>> 32768
>>
>>>> So in the "strange" state the resolver does not send queries with DO
>>>> bit, thus does not receive RRSIG for query type and therefore it can not
>>>> validate result. Then the remote authoritative server (who sent answer
>>>> without RRSIG) is added to the blacklist.
>>>>
>>>> In the log (complete version has been sent in previous mail):
>>>>
>>>> Request for DS nic.cz is sent to 194.0.12.1 (without DO bit), replied DS
>>>> is not validated due to missing its RRSIG and 194.0.12.1 is blacklisted.
>>>
>>>>>> Hmm, r2106 experiences the same issue :(
>>>>>>
>>>>>> It seems that there is no exact change between correct/incorrect
>>>>>> validation in the one time point. On the start there are all answers
>>>>>> correct, and when I am trying more and more (different in a few
>>>>>> cycles)
>>>>>> requests, then there are more and more incorrect answers. And in some
>>>>>> time point all answers are incorrect from the resolver until cache is
>>>>>> flushed (probably).

-- 
Ondřej Surý <ondrej at sury.org>
http://blog.rfc1925.org/