Maintained by: NLnet Labs

[Unbound-users] unbound crashing on FreeBSD

Sergey Matveychuk
Fri Jan 31 14:42:56 CET 2014


Forward Robert's answer to the list.

31.01.2014 15:30, Robert N. M. Watson wrote:
> On 31 Jan 2014, at 01:43, Sergey Matveychuk <sem at FreeBSD.org> wrote:
>
>> +rwatson at FreeBSD.org
>>
>> Hello, Robert!
>>
>> Could you give us a hint about this problem with Capsicum and Unbound, please?
>
> I've added Pawel to the CC line in case he has any insights.
>
> Capability limits, in general, apply only to file descriptors that have been explicitly limited using cap_rights_limit(), implicitly as a result of accept() from a limited socket, or open via openat() on a limited directory descriptor. It seems like there's scope for several possible bugs here:
>
> (1) A previously undetected bug means that the wrong file descriptor (but correctly limited) is being passed to the system call -- it's just no one noticed before because waiting on the wrong event can have subtle-to-spot outcomes sometimes (whereas writing to the wrong file descriptor is more often obvious!).
>
> (2) A file descriptor is unexpectedly (but correctly) limited -- perhaps returned by a library or inherited from another process, in which case we need to work out how to limit it less, or at least figure out what is going on and prevent the problem.
>
> (3) A bug exists in the Capsicum implementation, which manifests once in a while due to a race condition or similar, causing rights to be lost from capabilities improperly: you make a legitimate request but the rights are undesirably gone.
>
> (4) A bug exists in the file-descriptor implementation such that you specify the right unlimited file descriptor, but get an operation on the wrong one which fails.
>
> The usual tools for debugging this sort of thing are ktrace and procstat -fC. The latter gets a snapshot while the former provides lots of detail. You can make ktrace scale somewhat better by asking it not to log I/O and various other events that seem less relevant, but agreed that it may be a bit painful if long runtimes are required to reproduce the problem (and if it's a race condition, there's a chance you mask it by changing timing.)
>
> Robert
>
>>
>> 30.01.2014 18:52, W.C.A. Wijngaards wrote:
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>>
>>> Hi Mathieu,
>>>
>>> On 01/30/2014 03:42 PM, W.C.A. Wijngaards wrote:
>>>> Hi Mathieu,
>>>>
>>>> On 01/30/2014 03:25 PM, Mathieu Arnold wrote:
>>>>> Hi,
>>>>
>>>>> I've upgraded one of my resolvers to FreeBSD 10.0, and since
>>>>> then, unbound (1.4.21) crashes regularly (about once a day) with,
>>>>> say :
>>>>
>>>>> Jan 30 12:49:45 resolver3 unbound: [96044:2] fatal error:
>>>>> event_dispatch returned error -1, errno is Capabilities
>>>>> insufficient
>>>>
>>>>> Any hints on what may be wrong ?
>>>>
>>>> FreeBSD 10.  Does that have a fine-grained user capabilities
>>>> thing? event_dispatch would run kqueue for unbound (if you compiled
>>>> with libevent).  Does it not have permission to use kqueue?
>>>>
>>>> Without an event loop there is very little that unbound can do; no
>>>> events means no information about network sockets.
>>>>
>>>> If you compile --without-libevent, then unbound uses select()
>>>> which may avoid this.
>>>>
>>>> Perhaps this is about the number of sockets opened?  The
>>>> filedescriptor count in the ulimit structure?   You configured
>>>> unbound for high performance with many open sockets, but when it
>>>> does (when it gets busy once a day) the OS gives this error?
>>>> Strange because unbound checks the rlimits (resource limits) when
>>>> it starts.  Does it run out of memory, i.e. about once a day the
>>>> cache fills up and something set the ulimit on heap-size or
>>>> something like to, say, 1G but you configured unbound to use 2G,
>>>> and when it crosses the 1G line it gets killed (but weird that
>>>> kqueue gives an error).
>>>
>>> It is not the number of sockets or the heap limits, but capsicum.
>>>
>>>>
>>>> What version of libevent are you using?
>>>
>>> - From FreeBSD documentation I learned that this errno indicates that
>>> the capabilities associated with a socket did not permit an operation
>>> to be performed.  One of the capabilities is the capability to use the
>>> kqueue socket for kqueue polling.  But no doubt there are also other
>>> capabilities.  It says capabilities can be reduced but not expanded by
>>> the program.  This is great, but why does a particular fd have its
>>> capabilities reduced (unbound does not mess with socket capabilities)?
>>>
>>> I have no idea why the capability reduction happens.  ktrace is
>>> probably too expensive in its logging fervor?
>>>
>>> Best regards,
>>>     Wouter
>>> -----BEGIN PGP SIGNATURE-----
>>> Version: GnuPG v1
>>> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>>>
>>> iQIcBAEBAgAGBQJS6mc3AAoJEJ9vHC1+BF+NIbIP/Ah4OqGAxeCKiZ7dhCnr5ZzS
>>> pVqiHGAxILgmDXld1mxx3Xz6Vvx6gzNTEXQ4v1T3Q2+oWtQ2+y9djRLs1AnMShEe
>>> Gs3ZkB1H3ApZkhY25lyxy+AMPipsFdIWK/kj2SvdiPyxjanKV15WH3JG3vLS+c8S
>>> eqLPOjB/VQ6qMgzURI+1d8XjKcNSL/gm5yEK0jhTdqjqQDS98sJroWf1kvhXAmZJ
>>> DsOufwsW6AYkBHU/QvZK+ApKE+hG/6zekEj8X3SCB3lL8v1828YnAlefgG+cuvum
>>> VgmcTbRXcTAxlj0FCZeNsYizgvGMo8vjbgbqfLpSPc5MOKo+AsS2Q/XLTrV4iUK2
>>> /FWrw8yn3GQ9IX623OzmlIvIQl+8ofuTWM9cciv1ThzaViJPcSDI3BI6WkIQK0hA
>>> cnRZLoXztHAcCbliPjTUih5wCSFdUK680mje5oQs+1yl0s6OpjS6UQzAnL2FtSYb
>>> zcVp1QmUYYmLt0GHmaHSkbpnZo7Y8JBxVOB4tNQnwhXG5ePFje2n6Fiu93xrJfby
>>> yC5loolv/uEqTDri7V97Fe9DdrBTvNseK+iSmuLvWQOyZHnGvSxlXGlvtbVK9SiJ
>>> DCDzH8YRKOrB4CK/JBwXG9eoqWOsIgI4oZpU3p/E7WFUh+B+eme6yMsnRHstuK/6
>>> XdcTbm6GT+dHp89Zhs66
>>> =iG58
>>> -----END PGP SIGNATURE-----
>>> _______________________________________________
>>> Unbound-users mailing list
>>> Unbound-users at unbound.net
>>> http://unbound.nlnetlabs.nl/mailman/listinfo/unbound-users
>>>
>