Maintained by: NLnet Labs

[Unbound-users] Cascading Unbound and automatic key update

lst_hoe02 at kwsoft.de
Tue Jan 10 16:34:48 CET 2012


Zitat von "W.C.A. Wijngaards" <wouter at nlnetlabs.nl>:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi Andreas,
>
> On 01/10/2012 03:43 PM, lst_hoe02 at kwsoft.de wrote:
>> Hello
>>
>> we have a internal unbound cache using a second unbound instance at
>> the border firewall to do dns resolution with DNSSEC enabled. Today
>> our internal unbound stop working with errors like this:
>>
>> Jan 10 14:33:53 mailer unbound: [27958:0] info: validation failure
>> <www.at-web.de. A IN>: no DNSSEC records from x.x.x.x for DS
>> at-web.de. while building chain of trust Jan 10 14:33:53 mailer
>> unbound: [27958:0] info: validation failure <www.heise.de. A IN>:
>> no DNSSEC records from x.x.x.x for DS heise.de. while building
>> chain of trust
>
> So, what it looked like for this server was that dig @x.x.x.x DS
> heise.de +dnssec +norec +cdflag did not return any DNSSEC data.

The man-pages of my "dig" version does not know "+norec" and the above  
command lead to Status->Refused, without the "noreg" it got the  
following which looks sane to me:

; <<>> DiG 9.7.0-P1 <<>> @x.x.x.x heise.de DS +dnssec +cdflag
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 13864
;; flags: qr rd ra cd; QUERY: 1, ANSWER: 0, AUTHORITY: 6, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 4096
;; QUESTION SECTION:
;heise.de.			IN	DS

;; AUTHORITY SECTION:
H319DM5GC3EDEK691VQBHEHOT7VGGJ2B.de. 7136 IN NSEC3 1 1 15 BA5EBA11  
H31BIK9NA0MJD5K06JE5H9BBFDBD56DB NS SOA NAPTR RRSIG DNSKEY NSEC3PARAM
H319DM5GC3EDEK691VQBHEHOT7VGGJ2B.de. 7136 IN RRSIG NSEC3 8 2 7200  
20120117131500 20120110131500 30565 de.  
fd08T4Fapf6tVVOA2VmYXceBTUS5Ckjz8iqdBttzt4DgAq2e8bI4l/aE  
wHgXBl2P+CEq6m5H7d4X6WHXvoi+mWYof4LYb1cSW2l212kJ/jT4M6q4  
QMYrcocKZaFzKg/X4fZwD1ma0RQ7q8Mx09heV25TlZwxSBjbpRUQv4Ez /0U=
de.			7136	IN	SOA	f.nic.de. its.denic.de. 2012011062 7200 7200 3600000 7200
de.			7136	IN	RRSIG	SOA 8 1 86400 20120117131500 20120110131500 30565  
de. R5N20le84Cacq8mtIwKifWifIOgJN2tWULiJU/DGDxsBQPiqYkM9zec7  
dfgfs8XQbUx3Kkymsuo7sdanAQVld7ieew+aVP9yhgZdc18cmuk4hYBB  
1X1Sb8X249kv6xxR/D87pl57g86HW3OzG2pFhV+pjt5IWNUGvBCiiQkQ HUU=
UMUKTKOLDUUT050M28LQE3R399Q894KV.de. 6942 IN NSEC3 1 1 15 BA5EBA11  
UMUPU1E8C10ANEOEMVVG217UL77BN1H8 A RRSIG
UMUKTKOLDUUT050M28LQE3R399Q894KV.de. 6942 IN RRSIG NSEC3 8 2 7200  
20120117131500 20120110131500 30565 de.  
Gf4tjJyx6WwHi8tyX7UwkI2CYoyA0I3Jyjv9zqo7o/kmm9ztleOZZSFG  
y5DzFihl4vyvSVu6ZSmeMHjy1dniIMmvIPMOsWGK120vp/LGYjc0r+J+  
KsJsqb8F6bimi6EPy4Q80/Pc2UsOpoYToOawLCqHjMHE7mn76HpPJyXK oX8=

;; Query time: 0 msec
;; SERVER: x.x.x.x#53(x.x.x.x)
;; WHEN: Tue Jan 10 16:23:29 2012
;; MSG SIZE  rcvd: 742

> As if there were fragmentation problems.  And since it was internal
> there are extra firewalls or routers for that sort of thing to occur.

There is nothing between the two machines beside a switch, both  
machines have iptables but configured to let UDP/TCP port 53 pass. No  
logs for iptables either from this time.

>> The instance at the border firewall has no errors in the log and
>> works fine all the time. After restarting the internal instance, it
>> is also working fine again. The auto-trust-anchor-file of the
>> internal instance has a timestamp from the restart of the instance,
>> so i suspect something went wrong with the update of this file, but
>> i have no glue why the restart cured it.
>
> No, the timestamp was probably written right when you restart it.
> Because it is written when the root DNSKEY is seen.  When you restart
> it the cache is empty and it fetches the root DNSKEY.  And thus
> updates the file to note that it saw the root key.

That what strikes me odd. The internal unbound instance is not able to  
fetch the key some minutes ago, but on restart it is able to do so  
without problems. As there are also .com domains affected i suspect  
that it wasn't the key from heise.de which failed but it was simply  
the first to fail.

>>
>> Both instances are Unbound version 1.4.14 with auto-trust-anchor
>> enabled. The forwarding from internal to firewall instance is done
>> this way:
>>
>> forward-zone: name: "." forward-addr: x.x.x.x
>
> This looks fine.
>
>> What can we do to debug this problem and prevent it from happening
>> again?
>
> There is something happening with UDP.  There seems nothing wrong with
> key files.  The error is that somehow it gets no DNSSEC data (edns
> backoff, or messages arrive 'stripped' of DNSSEC data).

A said there is basically only a wire between the two. Will the keys  
be cached by unbound BTW? As said the external unbound does not have  
any problem at all while the internal does only delivers errors. Is it  
possible that the problem arise from DNS data delivered from external  
name servers?
It is very inconvenient if the central resolver cache stop working...

Thanks

Andreas