Maintained by: NLnet Labs

[Unbound-users] Errors in log

Wouter Wijngaards
Thu Jul 17 09:21:00 CEST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Anatoliy,

I think I know what is happening, you have 4 threads, very busy, each
one is allowed to use 1024 file descriptors for UDP, that totals to 4096
add in 4x100 incoming and 4x100 outgoing TCP file descriptors (to accept
incoming connections) and the number of file descriptors open is >=
1024. The value of FD_SETSIZE on SunOS 10 is 1024 by default, and that
is the problem (the "event_add failed. in cpsl" error).

So the problem is that select(2) is not capable of handling more than
1024 fds per process. So workarounds for you are:
* use less threads, 3 for example.
* set the outgoing range, incoming tcp, outgoing tcp and so on to
smaller values. 200, 10 and 10 for example.
* compile unbound with (a very recent version of) libevent (or libev).
libevent supports dev-ports and dev-poll (something on solaris akin to
epoll and kqueue). Such backends can easily handle large numbers of file
descriptors. Then you can work with a configuration like this.

I think I can detect the problem and print more sane error messages on
startup and recommend smaller config values or use of libevent.

Thanks for the debug logs :-)

Edited your email and forwarded to unbound-users because the workarounds
may be useful for others as well.

Best regards,
~   Wouter

Anatoliy Kushner wrote:
| I build latest unbound 1.0.1
|
| ulimit -a
| open files                    (-n) 32768
<snip>
|
| In log Errors like
|
| [1216263369] unbound[14624:2] error: could not event_del on close
| [1216263369] unbound[14624:1] error: event_add failed. in cpsl.
| [1216263369] unbound[14624:3] error: event_add failed. in cpsl.
| [1216263369] unbound[14624:1] error: could not event_del on close
| [1216263369] unbound[14624:3] error: event_add failed. in cpsl.
| [1216263369] unbound[14624:3] error: could not event_del on close
| [1216263369] unbound[14624:0] error: event_add failed. in cpsl.
| [1216263369] unbound[14624:0] error: could not event_del on close
| [1216263369] unbound[14624:3] error: event_add failed. in cpsl.
| [1216263369] unbound[14624:2] error: could not event_del on close
| [1216263369] unbound[14624:0] error: event_add failed. in cpsl.
| [1216263369] unbound[14624:0] error: could not event_del on close
| [1216263369] unbound[14624:2] error: could not event_del on close
| [1216263369] unbound[14624:0] error: event_add failed. in cpsl.
| [1216263369] unbound[14624:0] error: could not event_del on close
| [1216263369] unbound[14624:2] error: event_add failed. in cpsl.
| [1216263369] unbound[14624:2] error: could not event_del on close
| [1216263369] unbound[14624:1] error: event_add failed. in cpsl.
| [1216263369] unbound[14624:2] error: event_add failed. in cpsl.
| [1216263369] unbound[14624:2] error: could not event_del on close
| [1216263369] unbound[14624:1] error: could not event_del on close
| [1216263369] unbound[14624:1] error: event_add failed. in cpsl.
| [1216263369] unbound[14624:0] error: event_add failed. in cpsl.
| [1216263369] unbound[14624:1] error: could not event_del on close
| [1216263369] unbound[14624:0] error: could not event_del on close
| [1216263369] unbound[14624:2] error: event_add failed. in cpsl.
| [1216263369] unbound[14624:2] error: could not event_del on close
|
| 2008/7/14 Wouter Wijngaards <wouter at nlnetlabs.nl>:
| Anatoliy Kushner wrote:
| | Hmm, i rebuild unbound without libevent,
| | now in log too match errors :
| |
| | [1214846912] unbound[4145:3] error: recvfrom failed: Bad file number
| | [1214846912] unbound[4145:3] error: recvfrom failed: Bad file number
| | [1214846912] unbound[4145:3] error: recvfrom failed: Bad file number
| | [1214846913] unbound[4145:3] error: recvfrom failed: Bad file number
| | [1214846913] unbound[4145:3] error: recvfrom failed: Bad file number
| | [1214846913] unbound[4145:3] error: recvfrom failed: Bad file number
| | [1214846913] unbound[4145:3] error: recvfrom failed: Bad file number
| | [1214846913] unbound[4145:3] error: recvfrom failed: Bad file number
| | [1214846913] unbound[4145:3] error: recvfrom failed: Bad file number
| | [1214846913] unbound[4145:3] error: recvfrom failed: Bad file number
| | [1214846913] unbound[4145:3] error: recvfrom failed: Bad file number
| | [1214846914] unbound[4145:3] error: recvfrom failed: Bad file number
| | [1214846914] unbound[4145:3] error: recvfrom failed: Bad file number
| | [1214846914] unbound[4145:3] error: recvfrom failed: Bad file number
| | [1214846914] unbound[4145:3] error: recvfrom failed: Bad file number
| | [1214846914] unbound[4145:3] error: recvfrom failed: Bad file number
| | [1214846914] unbound[4145:3] error: recvfrom failed: Bad file number
| | [1214846914] unbound[4145:3] error: recvfrom failed: Bad file number
| | [1214846914] unbound[4145:2] error: accept failed: Resource
| | temporarily unavailable
| | [1214846914] unbound[4145:1] error: accept failed: Resource
|
| Hi Anatoliy,
|
| Could this be a ulimit (open files) problem? what is `ulimit -n` ?
|
| You could try to increase the limit on file descriptors, perhaps it is
| running out of them. 1024 (the default for linux, coincidentally) should
| be enough.
|
| If that fails, then please tell me what version of Solaris are you using
| (opensolaris, 9, 10, .. ?; `uname -a`). Also, run could you run unbound
| with verbosity set to the maximum (4) and email me the (compressed)
| output directly? Better not send that huge logfile to the mailing list.
|
| Also include your configuration file for unbound, that may help too.
|
| For what it's worth, I have unbound running on Solaris/sparc here just
| fine ...
|
| Best regards,
| ~   Wouter
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAkh+8twACgkQkDLqNwOhpPjgbACfXRNIKKSW5uKJV5I72e/TKZUY
qoEAn0BDI+usFatfsAFdHg4EAl3gC6oC
=8jp6
-----END PGP SIGNATURE-----