Jostle logic seems to randomly stop working

W.C.A. Wijngaards wouter at nlnetlabs.nl
Thu May 31 13:36:06 UTC 2018


Hi Dmitri,

There is a fix slated for the next release, which is as a patch below.
I think this will solve those non-jostle list too full errors.  It
decrements the num_reply_states counter and thus the incoming queries
won't get dropped any more because that counter became too big.

Best regards, Wouter

diff -Naur tags/release-1.7.1/services/mesh.c trunk/services/mesh.c
--- tags/release-1.7.1/services/mesh.c   2018-05-07 10:17:33.605761180 +0200
+++ trunk/services/mesh.c        2018-05-07 13:28:16.907313134 +0200
@@ -1173,6 +1173,10 @@
         while((c = mstate->cb_list) != NULL) {
                 /* take this cb off the list; so that the list can be
                  * changed, eg. by adds from the callback routine */
+                if(!mstate->reply_list && mstate->cb_list && !c->next) {
+                        /* was a reply state, not anymore */
+                        mstate->s.env->mesh->num_reply_states--;
+                }
                 mstate->cb_list = c->next;
                 if(!mstate->reply_list && !mstate->cb_list &&
                         mstate->super_set.count == 0)


On 25/05/18 06:03, Dmitri Kourennyi wrote:
> My original reply was rejected by the server (I think) as I included too
> much log data. Apologies this gets posted twice.
> 
>> Can you show the work that it does for looking up one of the root
>> servers?  Not getting an address must be causing it to not have content.
>>  But it does work (eventually), you say, once the long list appears,
>> that means the AXFR has completed.  Meanwhile you should have normal
>> service, because the zone is loaded (the file that is configured has
>> content, right?)?  When a normal query arrives, it should just be
>> answered with the auth-zone?
> 
> Detailed log of lookups included below. I would like to add that the
> issue may be associated specifically be the lookup of the auth-zone
> domain names. At work, I use unbound as a local caching server on my
> laptop, with a similar configuration as my home router, but I used IP
> addresses instead of hostnames, and I have not had any problems with it
> (this might be a useful work-around to try for anyone who still want to
> use the auth-zone functionality):
> 
> auth-zone:
>   name: "."
>   master: 193.0.14.129         # k.root-servers.net
>   master: 192.112.36.4         # g.root-servers.net
>   master: 2001:500:2f::f       # f.root-servers.net
>   master: 2001:500:84::b       # b.root-servers.net
>   master: 192.0.47.132         # xfr.cjr.dns.icann.org
>   master: 2001:7fd::1          # k.root-servers.net
>   master: 192.0.32.132         # xfr.lax.dns.icann.org
>   master: 192.33.4.12          # c.root-servers.net
>   master: 2620:0:2830:202::132 # xfr.cjr.dns.icann.org
>   master: 2620:0:2d0:202::132  # xfr.lax.dns.icann.org
>   master: 192.228.79.201       # b.root-servers.net
>   master: 192.5.5.241          # f.root-servers.net
>   fallback-enabled: yes
>   for-downstream: no
>   for-upstream: yes
>   zonefile: root.zone
> 
> Looking at the logs again, I noticed there was one incident of a
> dropped query ~15min before all queries start to fail. However, that
> query ends up completing, I think, and at 14:43:40 a different query
> goes through without issue. It's at 15:00 that queries begin to fail.
> I don't know if this error before the root server lookup is part of
> the cause or a red herring.
> 
> May 18 14:43:35 homebrew unbound[30620]: [30620:0] info: 0RDd mod1 rep
> www.google.com. AAAA IN
> May 18 14:43:35 homebrew unbound[30620]: [30620:0] info: 1RDd mod1 rep
> connectivitycheck.gstatic.com. AAAA IN
> May 18 14:43:35 homebrew unbound[30620]: [30620:0] debug: cache memory
> msg=382505 rrset=462002 infra=7600 val=135128
> May 18 14:43:35 homebrew unbound[30620]: [30620:0] debug: answer from
> the cache failed
> May 18 14:43:35 homebrew unbound[30620]: [30620:0] debug: udp request
> from ip4 192.168.0.154 port 64667 (len 16)
> May 18 14:43:35 homebrew unbound[30620]: [30620:0] warning: No jostle
> attempt. m=(nil)   # <= This is logging I added to find out which if
> condition is failing
> May 18 14:43:35 homebrew unbound[30620]: [30620:0] debug: Too many
> queries. dropping incoming query.
> May 18 14:43:35 homebrew unbound[30620]: [30620:0] debug: cache memory
> msg=382505 rrset=462002 infra=7600 val=135128
> May 18 14:43:35 homebrew unbound[30620]: [30620:0] debug: comm point
> listen_for_rw 15 0
> May 18 14:43:35 homebrew unbound[30620]: [30620:0] debug: comm point
> listen_for_rw 16 0
> 
> *Log of root server lookups below. Note that this occurs when I am not
> home, with a single tablet as the only active device, so it's not that
> there's a large amount of traffic or anything like that.
> 
> May 18 14:59:11 homebrew unbound[30620]: [30620:0] info: auth zone .:
> master lookup for task_probe b.root-servers.net. A IN
> May 18 14:59:11 homebrew unbound[30620]: [30620:0] debug: mesh_run: start
> May 18 14:59:11 homebrew unbound[30620]: [30620:0] debug:
> validator[module 0] operate: extstate:module_state_initial
> event:module_event_new
> May 18 14:59:11 homebrew unbound[30620]: [30620:0] info: validator
> operate: query b.root-servers.net. A IN
> May 18 14:59:11 homebrew unbound[30620]: [30620:0] debug: validator:
> pass to next module
> May 18 14:59:11 homebrew unbound[30620]: [30620:0] debug: mesh_run:
> validator module exit state is module_wait_module
> May 18 14:59:11 homebrew unbound[30620]: [30620:0] debug:
> iterator[module 1] operate: extstate:module_state_initial
> event:module_event_pass
> May 18 14:59:11 homebrew unbound[30620]: [30620:0] debug:
> process_request: new external request event
> May 18 14:59:11 homebrew unbound[30620]: [30620:0] debug: iter_handle
> processing q with state INIT REQUEST STATE
> May 18 14:59:11 homebrew unbound[30620]: [30620:0] info: resolving
> b.root-servers.net. A IN
> May 18 14:59:11 homebrew unbound[30620]: [30620:0] debug: request has
> dependency depth of 0
> May 18 14:59:11 homebrew unbound[30620]: [30620:0] info: msg from cache
> lookup ;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 0
>                                          ;; flags: qr rd ra ; QUERY: 1,
> ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
>                                          ;; QUESTION SECTION:
>                                          b.root-servers.net.        IN  
>      A
> 
>                                          ;; ANSWER SECTION:
>                                          b.root-servers.net.       
> 1841075        IN        A        199.9.14.201
> 
>                                          ;; AUTHORITY SECTION:
> 
>                                          ;; ADDITIONAL SECTION:
>                                          ;; MSG SIZE  rcvd: 52
> May 18 14:59:11 homebrew unbound[30620]: [30620:0] debug: msg ttl is
> 41390, prefetch ttl 32750
> May 18 14:59:11 homebrew unbound[30620]: [30620:0] debug: returning
> answer from cache.
> May 18 14:59:11 homebrew unbound[30620]: [30620:0] debug: iter_handle
> processing q with state FINISHED RESPONSE STATE
> May 18 14:59:11 homebrew unbound[30620]: [30620:0] info: finishing
> processing for b.root-servers.net. A IN
> May 18 14:59:11 homebrew unbound[30620]: [30620:0] debug: mesh_run:
> iterator module exit state is module_finished
> May 18 14:59:11 homebrew unbound[30620]: [30620:0] debug:
> validator[module 0] operate: extstate:module_wait_module
> event:module_event_moddone
> May 18 14:59:11 homebrew unbound[30620]: [30620:0] info: validator
> operate: query b.root-servers.net. A IN
> May 18 14:59:11 homebrew unbound[30620]: [30620:0] debug: validator:
> nextmodule returned
> May 18 14:59:11 homebrew unbound[30620]: [30620:0] debug: response has
> already been validated: sec_status_insecure
> May 18 14:59:11 homebrew unbound[30620]: [30620:0] debug: mesh_run:
> validator module exit state is module_finished
> May 18 14:59:11 homebrew unbound[30620]: [30620:0] debug: auth host
> b.root-servers.net lookup 199.9.14.201


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.nlnetlabs.nl/pipermail/unbound-users/attachments/20180531/5f293a91/attachment.bin>


More information about the Unbound-users mailing list