>>> OpenLDAP (slapd) is crashing very frequently under runsv (hundreds
>>> of times per day, usually when it gets a burst of queries).
It looks like there's a bug in slapd that manifests itself when it's run
under runsv and not when it's run under System V init. Signal 11 is SIGSEGV,
it means either a hardware error or a bug in the application; since your
slapd works fine under load when unsupervised, the hardware error is
unlikely.
To track the bug, you'll need to check the exact differences between
running slapd under runsv and running slapd manually. The main differences
usually are:
* working directory. That's a silly one, but I've seen problems because of it.
* environment variables. Those are usually a smoking gun, especially PATH;
System V scripts usually pollute the environment with lots of variables,
some of which are actually needed for the daemon to run correctly.
* open file descriptors. Typically, check whether stdin and stdout are
closed, or pointing to /dev/null, or pointing to something else (which is
probably a misconfiguration).
* some options. Typically, a "log to stderr instead of syslog" option
and a "don't background yourself" option, which you don't set with sysvinit
but do set with a supervision system. Some daemons do horrible, ugly things
when you set those.
* terminals and sessions, which are usually set wrong when you run a daemon
from your command line - but running it under runsv or an init.d script should
yield the same result (no controlling terminal) so it's not a likely culprit.
If all else fails, a good way to gather information is to strace a run with
supervision, a run without, and compare the straces side by side.
>> My first guess is that slapd dies when it gets an errror writing to a
>> full logging pipe. Is there a logger set up for this service? Is it
>> writing log data? Is it staying up?
Writing to a full log pipe probably wouldn't cause a SIGSEGV. Most
programs would simply block there.
--
Laurent
Received on Thu Apr 24 2014 - 18:23:57 UTC