Maintained by: NLnet Labs

[Unbound-users] edns client subnets

Larry Havemann
Wed Jul 9 18:53:18 CEST 2014


Hi Yuri,

I was wondering if you'd made any head way on the ECS TTL bug or the bug in
unbound-control that causes ECS queries to be counted as misses even when
they are returned from cache.

Thanks,
Larry

-Larry


On Tue, May 6, 2014 at 9:51 AM, Larry Havemann <larry at edgecast.com> wrote:

> Here is an example showing the behavior described above:
>
> Client 1 subnet 110.232.0.0/24 Asia:
> root at dnsr001:/usr/etc/unbound# /EdgeCast/ecdns/bin/dig_iana @localhost
> gp1.wpc.edgecastcdn.net +client=110.232.0.0/24
>
> ; <<>> DiG 9.9.3-P1 <<>> @localhost gp1.wpc.edgecastcdn.net +client=
> 110.232.0.0/24
> ; (2 servers found)
> ;; global options: +cmd
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 12638
> ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
>
> ;; OPT PSEUDOSECTION:
> ; EDNS: version: 0, flags:; udp: 4096
> ; CLIENT-SUBNET: 110.232.0.0/24/19
> ;; QUESTION SECTION:
> ;gp1.wpc.edgecastcdn.net. IN A
>
> ;; ANSWER SECTION:
> *gp1.wpc.edgecastcdn.net <http://gp1.wpc.edgecastcdn.net>. 3600 IN A
> 117.18.232.133  <=== Asia IP*
>
> ;; Query time: 45 msec
> ;; SERVER: 127.0.0.1#53(127.0.0.1)
> ;; WHEN: Tue May 06 16:32:57 UTC 2014
> ;; MSG SIZE  rcvd: 79
>
> Client 2 no subnet:
> root at dnsr001:/usr/etc/unbound# /EdgeCast/ecdns/bin/dig_iana @localhost
> gp1.wpc.edgecastcdn.net
>
> ; <<>> DiG 9.9.3-P1 <<>> @localhost gp1.wpc.edgecastcdn.net
>  ; (2 servers found)
> ;; global options: +cmd
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 44048
> ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
>
> ;; OPT PSEUDOSECTION:
> ; EDNS: version: 0, flags:; udp: 4096
> ;; QUESTION SECTION:
> ;gp1.wpc.edgecastcdn.net. IN A
>
> ;; ANSWER SECTION:
> *gp1.wpc.edgecastcdn.net <http://gp1.wpc.edgecastcdn.net>. 3600 IN A
> 72.21.81.253  <==== NA IP*
>
> ;; Query time: 2 msec
> ;; SERVER: 127.0.0.1#53(127.0.0.1)
> ;; WHEN: Tue May 06 16:33:01 UTC 2014
> ;; MSG SIZE  rcvd: 68
>
> Client 3 46.22.74.0/24 Europe:
> root at dnsr001:/usr/etc/unbound# /EdgeCast/ecdns/bin/dig_iana @localhost
> gp1.wpc.edgecastcdn.net +client=46.22.74.0/24
>
> ; <<>> DiG 9.9.3-P1 <<>> @localhost gp1.wpc.edgecastcdn.net +client=
> 46.22.74.0/24
> ; (2 servers found)
> ;; global options: +cmd
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 33795
> ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
>
> ;; OPT PSEUDOSECTION:
> ; EDNS: version: 0, flags:; udp: 4096
> ; CLIENT-SUBNET: 46.22.74.0/24/0
> ;; QUESTION SECTION:
> ;gp1.wpc.edgecastcdn.net. IN A
>
> ;; ANSWER SECTION:
> *gp1.wpc.edgecastcdn.net <http://gp1.wpc.edgecastcdn.net>. 3583 IN A
> 72.21.81.253  <===== NA IP*
>
> ;; Query time: 0 msec
> ;; SERVER: 127.0.0.1#53(127.0.0.1)
> ;; WHEN: Tue May 06 16:33:18 UTC 2014
> ;; MSG SIZE  rcvd: 79
>
>
> This is a show stopper because a client coming in from 46.22.74.0 is in
> Europe and with this result is being directed to servers located in North
> America.  This is a major performance hit for the end user, the correct
> response should be:
> root at dnsr001:/usr/etc/unbound# /EdgeCast/ecdns/bin/dig_iana @localhost
> gp1.wpc.edgecastcdn.net +client=46.22.74.0/24
>
> ; <<>> DiG 9.9.3-P1 <<>> @localhost gp1.wpc.edgecastcdn.net +client=
> 46.22.74.0/24
> ; (2 servers found)
> ;; global options: +cmd
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 31443
> ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
>
> ;; OPT PSEUDOSECTION:
> ; EDNS: version: 0, flags:; udp: 4096
> ; CLIENT-SUBNET: 46.22.74.0/24/24
> ;; QUESTION SECTION:
> ;gp1.wpc.edgecastcdn.net. IN A
>
> ;; ANSWER SECTION:
> *gp1.wpc.edgecastcdn.net <http://gp1.wpc.edgecastcdn.net>. 3600 IN A
> 93.184.221.133  <==== Europe IP*
>
> ;; Query time: 3 msec
> ;; SERVER: 127.0.0.1#53(127.0.0.1)
> ;; WHEN: Tue May 06 16:35:07 UTC 2014
> ;; MSG SIZE  rcvd: 79
>
>
> The use case where this becomes a show stopper is this.  As a CDN we pull
> content from many different customers.  If the customer has geoip rules
> setup so that content in the EU is pulled from an EU origin server and
> content for NA is pulled from an NA server this would cause the pull to
> happen from the wrong part of the world.  Now if 1000's of servers in the
> EU and 1000's of servers in NA are all pulling from the same location
> instead of distributing the load correctly as described in the geoip rule
> the NA origin server could fail.  This would lead to their content either
> never loading or becoming stale.  So for me the result from regular cache
> is in fact wrong.  I did find a way to force it to always do ecs lookup by
> white listing 1-255.0.0.0/8, but to me this seems like a very sloppy way of
> doing things.
>
> I think the performance hit would be acceptable to anyone enabling ECS so
> long as it is documented that there is a performance hit.
>
> -Larry
>
>
> On Tue, May 6, 2014 at 2:36 AM, Yuri Schaeffer <yuri at nlnetlabs.nl> wrote:
>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Hi Larry,
>>
>> > This creates an issue where you have a user coming in with ecs
>> > 1.2.3.0/24 gets the correct answer, then user coming without ecs
>> > get the default answer and finally user with ecs 1.2.4.0/24 get
>> > default instead of the correct answer.
>>
>> Did you test this?
>>
>> I believe this is not what would happen. The final user asking for ecs
>> 1.2.4.0/24 would get an answer from the ecs cache. In case it was not in
>> cache an upstream lookup would be done with ecs 1.2.4.0/24. Yielding the
>> most accurate response.
>>
>> Also, I'd like to stress that an answer from the 'regular' cache is not
>> wrong, merely suboptimal. It is what a non ECS aware resolver would
>> answer.
>>
>> Please note there is no need for the clients to 'speak' ECS. When you
>> whitelist a server to do ECS Unbound will ask it for the most specific
>> answer for its client. Relaying ECS, specially to non-whitelisted
>> servers is a courtesy to the client and not mandated by the draft.
>>
>> > To me and for my use this is a major show stopper.
>>
>> I'm interested in your usecase and what functionality you are looking
>> for. What do you expect from a recursor? Do you not intent to use the
>> whitelist functionality?
>>
>> > Is there anyway the look up order could be reversed if ecs is
>> > enabled?  So that all queries first try ecs then goto normal
>> > cache?
>>
>> While possible it would affect every single incoming query. It is
>> assumed ECS is only communicated with a fraction of authority servers.
>> It would mean a significant performance hit, especially since the ECS
>> cache is not a straight forward key:value lookup.
>>
>> Best regards,
>> Yuri
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1
>> Comment: Using GnuPG with Icedove - http://www.enigmail.net/
>>
>> iEYEARECAAYFAlNorSsACgkQI3PTR4mhavhdeACfUpkA6wiZMnzgTbgN/ZKcYL65
>> JdkAniIjFO1w00N3rnPU+Yy55C16Mx/X
>> =lk90
>> -----END PGP SIGNATURE-----
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://unbound.nlnetlabs.nl/pipermail/unbound-users/attachments/20140709/a6ecbe57/attachment.html>