Ability to detect when queries are being blocked at the network level

John Peacock jpeacock at messagesystems.com
Thu May 3 13:07:22 UTC 2018


We run a robust carrier-grade e-mail service in the cloud and have a
dedicated DNS infrastructure that has undergone extensive tuning to work in
AWS, see

  https://www.sparkpost.com/blog/undocumented-limit-dns-aws/
  https://www.sparkpost.com/blog/dns-aws-network-lessons/

https://www.usenix.org/sites/default/files/conference/protected-files/srecon18americas_slides_blosser.pdf

We occasionally have issues where we are unable to perform MX lookups for
what appears to be a perfectly valid domain.  I tracked down one such
incident yesterday and in this case the authoritative name servers for the
domain were deliberately blocking queries (something i confirmed from the
NS box using dnstracer).  The MX query works fine with the Google 8.8.8.8
resolver or indeed the AWS VPC default resolver (which we cannot rely on,
see above).

I can't find a way to monitor for this condition (which manifests as
SERVFAIL ultimately).  I've read the docs about how Unbound handles probes
and backoff, but I don't see any metric exposed that would tell me the
domains where this is happening.  If I could have a way to get the list of
domains that display SERVFAIL, I could write an out-of-band script that
attempts to resolve them via alternate paths and adds them to a whitelist
config.

Thanks in advance for any suggestions

John

-- 
JOHN
​​
PEACOCK
lead software engineer - sre

tel 877-887-3031 x239
mobile 240-429-9334
email john.peacock at sparkpost.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.nlnetlabs.nl/pipermail/unbound-users/attachments/20180503/19e29703/attachment.htm>


More information about the Unbound-users mailing list