Maintained by: NLnet Labs

[Unbound-users] unbound crashing on FreeBSD

W.C.A. Wijngaards
Mon Feb 3 10:58:23 CET 2014


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Sergey, Robert,

On 01/31/2014 02:42 PM, Sergey Matveychuk wrote:
> Forward Robert's answer to the list.
> 
> 31.01.2014 15:30, Robert N. M. Watson wrote:
>> On 31 Jan 2014, at 01:43, Sergey Matveychuk <sem at FreeBSD.org>
>> wrote:
>> 
>>> +rwatson at FreeBSD.org
>>> 
>>> Hello, Robert!
>>> 
>>> Could you give us a hint about this problem with Capsicum and 
>>> Unbound, please?
>> 
>> I've added Pawel to the CC line in case he has any insights.
>> 
>> Capability limits, in general, apply only to file descriptors
>> that have been explicitly limited using cap_rights_limit(),
>> implicitly as a result of accept() from a limited socket, or open
>> via openat() on a limited directory descriptor. It seems like
>> there's scope for several possible bugs here:
>> 
>> (1) A previously undetected bug means that the wrong file
>> descriptor (but correctly limited) is being passed to the system
>> call -- it's just no one noticed before because waiting on the
>> wrong event can have subtle-to-spot outcomes sometimes (whereas
>> writing to the wrong file descriptor is more often obvious!).

So, we do not run into regressions, and the software seems to work for
a long time.  This must then be some sort of odd corner case in the
code, generally that would mean a file descriptor is interchanged with
another, and it is closed while still being used.  Would that not
result in other error output?  How would we find out which file
descriptor is wrongly being used?

>> (2) A file descriptor is unexpectedly (but correctly) limited -- 
>> perhaps returned by a library or inherited from another process,
>> in which case we need to work out how to limit it less, or at
>> least figure out what is going on and prevent the problem.

Since NSD does not do libcapsicum calls, and it uses very little
library.  Openssl is used for nsd-control connections.  So I would
think this option is not there.

>> (3) A bug exists in the Capsicum implementation, which manifests
>> once in a while due to a race condition or similar, causing
>> rights to be lost from capabilities improperly: you make a
>> legitimate request but the rights are undesirably gone.

This could be the case, from my point of view NSD4 works great except
on FreeBSD10-capsicum.

>> (4) A bug exists in the file-descriptor implementation such that
>> you specify the right unlimited file descriptor, but get an
>> operation on the wrong one which fails.

There should not be 'limited' file descriptors around at all.

>> The usual tools for debugging this sort of thing are ktrace and 
>> procstat -fC. The latter gets a snapshot while the former
>> provides lots of detail. You can make ktrace scale somewhat
>> better by asking it not to log I/O and various other events that
>> seem less relevant, but agreed that it may be a bit painful if
>> long runtimes are required to reproduce the problem (and if it's
>> a race condition, there's a chance you mask it by changing
>> timing.)

Regardless of where the bug is, some sort of trace would probably be
very helpful.  Catching it at the point where it exits is hard since
it takes all day and happens at an unexpected point in time (note that
NSD has multiple processes active; the one listening to nsd-control
uses libevent, the 'master' uses select() to avoid forking issues, and
the 'servers' use libevent to listen to UDP and a number of TCP
connections).

Best regards,
   Wouter

>> Robert
>> 
>>> 
>>> 30.01.2014 18:52, W.C.A. Wijngaards wrote:
> Hi Mathieu,
> 
> On 01/30/2014 03:42 PM, W.C.A. Wijngaards wrote:
>>>>>> Hi Mathieu,
>>>>>> 
>>>>>> On 01/30/2014 03:25 PM, Mathieu Arnold wrote:
>>>>>>> Hi,
>>>>>> 
>>>>>>> I've upgraded one of my resolvers to FreeBSD 10.0, and
>>>>>>> since then, unbound (1.4.21) crashes regularly (about
>>>>>>> once a day) with, say :
>>>>>> 
>>>>>>> Jan 30 12:49:45 resolver3 unbound: [96044:2] fatal
>>>>>>> error: event_dispatch returned error -1, errno is
>>>>>>> Capabilities insufficient
>>>>>> 
>>>>>>> Any hints on what may be wrong ?
>>>>>> 
>>>>>> FreeBSD 10.  Does that have a fine-grained user
>>>>>> capabilities thing? event_dispatch would run kqueue for
>>>>>> unbound (if you compiled with libevent).  Does it not
>>>>>> have permission to use kqueue?
>>>>>> 
>>>>>> Without an event loop there is very little that unbound
>>>>>> can do; no events means no information about network
>>>>>> sockets.
>>>>>> 
>>>>>> If you compile --without-libevent, then unbound uses
>>>>>> select() which may avoid this.
>>>>>> 
>>>>>> Perhaps this is about the number of sockets opened?  The 
>>>>>> filedescriptor count in the ulimit structure?   You
>>>>>> configured unbound for high performance with many open
>>>>>> sockets, but when it does (when it gets busy once a day)
>>>>>> the OS gives this error? Strange because unbound checks
>>>>>> the rlimits (resource limits) when it starts.  Does it
>>>>>> run out of memory, i.e. about once a day the cache fills
>>>>>> up and something set the ulimit on heap-size or something
>>>>>> like to, say, 1G but you configured unbound to use 2G, 
>>>>>> and when it crosses the 1G line it gets killed (but weird
>>>>>> that kqueue gives an error).
> 
> It is not the number of sockets or the heap limits, but capsicum.
> 
>>>>>> 
>>>>>> What version of libevent are you using?
> 
> - From FreeBSD documentation I learned that this errno indicates
> that the capabilities associated with a socket did not permit an
> operation to be performed.  One of the capabilities is the
> capability to use the kqueue socket for kqueue polling.  But no
> doubt there are also other capabilities.  It says capabilities can
> be reduced but not expanded by the program.  This is great, but why
> does a particular fd have its capabilities reduced (unbound does
> not mess with socket capabilities)?
> 
> I have no idea why the capability reduction happens.  ktrace is 
> probably too expensive in its logging fervor?
> 
> Best regards, Wouter
>>>> _______________________________________________ Unbound-users
>>>> mailing list Unbound-users at unbound.net 
>>>> http://unbound.nlnetlabs.nl/mailman/listinfo/unbound-users
>>>> 
>> 
> 
> _______________________________________________ Unbound-users
> mailing list Unbound-users at unbound.net 
> http://unbound.nlnetlabs.nl/mailman/listinfo/unbound-users

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJS72g/AAoJEJ9vHC1+BF+NgocP/3EJN3qdhAYKKWRPZMFZkPV5
xdp6rkysmhI08aEuuSTQ7qEnD2lg1GFI960Id0+ZA7laZGSP2LIyqHcDHsggnkXP
8PEnhxtJEXcPzdX57XjT/VIBj87vWUSpTZSUgIfOc7sgvSn1SQa6XLc0BGFMkZkU
uh/wWSmRd95JWATNUWpQ8McxJ7+GBY5/TMrvzUW/rI3AijTe2eCRV6HyK3UFpGzi
FoNDnpaZn/D5yvgjM+M703/l7gnC7L+oBcwS8LT3mG0jSBDtXccG1G247jCgZN9T
fMU0is1e3AzOUUzFY3wQ7qDzt7wKu2Mo1dgSwLEkwUdUu+gpOaFUwIPP9henIGmr
4wISkMUREhep1B822/irkmzkGxbiZJzeeepMLPDQ7d+PH+ohKvvv3zweKnZHMbwt
VOzy2LTPOzUGzklVEBEiukt33LL3Anf/dBKFdZP1Yw/0QHYXqYT9RD4zVgvZVcqu
hj2aBsdRRyOPfUaQ8E6YfKvFFHWYsNWHbUZ05BC7ZBzBfmLOoSrZ6cIP0qPpoQCA
lJqZqMZPXITm0gYPegJ9uWGssaTT2h3VpkfSath0jjAq/Tq+FRVajLqaiLsBROBN
j2HMXmm4pWJhNCxRg/9uY5G1vKOKHzx3QQic0gAx653vihDfBrg7HcCEgs17qH2D
Uljpa/UxcOXuEvXY6/kj
=+UdR
-----END PGP SIGNATURE-----