possible s6-rc redesign (was: [request for review] Port of s6 documentation to mdoc(7)) from Laurent Bercot on 2020-09-01 (supervision)

From: Laurent Bercot <ska-supervision_at_skarnet.org>
Date: Tue, 01 Sep 2020 10:00:22 +0000

>I have only seen one new feature committed to the Git repository so
>far. Is it too early to ask what are you planning to change?

  The new feature is orthogonal - or, rather, it will be used if I end
up *not* redesigning s6-rc.

  The trend with distributions is to make service managers reactive to
external events: typically NetworkManager and systemd-networkd because
the network is the primary source of dynamic events, but even local
events such as the ones produced by a device manager, or basically
anything sent by the kernel on the netlink, are getting integrated
into that model.

  s6-rc is, by essence, static: the set of services is known in advance,
and there is no reacting to external events - there is only the admin
starting and stopping services. This has advantages - a compile-time
analysis is possible, with early cycle detection, etc.; but the model
doesn't integrate well with modern distro needs. So, I've been thinking
about ways to add dynamic event management to s6-rc; and I've found
two options.

  Option 1 is to add dynamic event management *on top of* s6-rc. That
is my natural instinct; that is what I've always done with software,
that's what keeps the various parts of my software as clean and simple
as possible. Here, it would mean:
  - having a classic s6-rc database for "static" services
  - having an additional "dynamic" database for services that can be
triggered by external events. (The database is static in essence, but
I call it "dynamic" because it would host the services that can be
started dynamically.)
  - having a s6-rc-eventd daemon listening to events and executing
s6-rc commands on the dynamic database depending on the events it
receives. Paired with a s6-rc-event program that sends events to
s6-rc-eventd, meant to be invoked in situations such as udevd/mdevd
rules, a netlink listener, etc.

  This model works in my head, the s6-rc-event[d] programs would be quite
simple to write, it would solve the problem in a modular way just like
the skarnet.org usual, so it seems like a no-brainer.
  Except for one thing: I don't think anybody would use this. Only me,
you, and the other 6 hardcore people in the world who actually like
this kind of design.

  If there's one thing that has been made painfully obvious to me these
past few years, it is that most people, and especially most *distro*
people - which are the ones I would like to reach -, perceive the s6
stack as very complex. They're intimidated by it; they find the
abundance of moving parts off-putting and difficult to get into.
With very few exceptions, the people who actually take the plunge and
make the time and energy investment necessary to understand the model,
what the various parts do and how they fit together, those people all
love it, and are very enthusiastic about it, and they're what keeps me
going. But the initial barrier of potential, the ultra-steep learning
curve, is indisputably the limiting factor in the spread of the s6
ecosystem.

  s6 as a supervision suite? okay, people will use it; but it's already
perceived as a bit complex, because there are a lot of binaries.
It's on the high end of the acceptable difficulty range.

  s6 as an init system? "what is this s6-linux-init thing? why do I need
this? runit is simpler, I'll stick to runit." Even though runit has
problems, has less functionality, and is barely maintained. There are,
for instance, several people in Void Linux who are interested in
switching to s6, but despite s6 being an almost drop-in replacement for
runit, the switch has not been made, because it requires learning s6 and
s6-linux-init, and most Void people do not feel the effort is worth it.

  s6-rc? "waah I don't like the source directory format, I want text
files, and why is it so different from 'service foo start'? And why
doesn't it come with integrated policy like OpenRC or systemd?" People
understand the benefit in separating mechanism from policy, in theory,
but in practice nobody wants to write policy. (Can't blame them: I find
it super boring, too.) Having the tool is not enough; it needs to be
gift-wrapped as well, it needs to be nice to use.

  If I add a s6-rc-event family of binaries to s6-rc, the fact that it
is yet another layer of functionality, that you now need *two*
databases, etc., will make a whole additional category of people just
give up. The outreach will be, mark my words, *zero*. If not negative.

  The fact is that a full-featured init system *is* a complex beast, and
the s6 stack does nothing more than is strictly needed, but it exposes
all the tools, all the entrails, all the workings of the system, and
that is a lot for non-specialists to handle. Integrated init systems,
such as systemd, are significantly *more* complex than the s6 stack, but
they do a better job of *hiding* the complexity, and presenting a
relatively simple interface. That is why, despite being technically
inferior (on numerous metrics: bugs, length of code paths, resource
consumption, actual modularity, flexibility, portability, etc.), they
are more easily accepted: they are just less intimidating.

  As a friend told me, and it was an enlightening moment: you are keeping
the individual parts simple, but in doing so, you are moving the
complexity to the *interactions* between the parts, and are burdening
the user with that complexity. You are keeping the code simple, which
has undeniable maintainability benefits, but you are making the
administration more difficult, and the trade-off is not good enough for
a lot of users.

  For a while, my answer to that has been: this is all an interface
problem. I need to work on s6-frontend, in order to provide a unified,
user-friendly interface; then, people who want simplicity can use the
high-level interface, and advanced users can lift the hood and manually
tweak the engine.
  I still believe that is a good model and a good idea. However, having
worked for a couple months on a user-friendly interface for service
management with s6-rc that could be a prototype for a part of
s6-frontend, and having started to think about details of s6-frontend,
I've come to realize that shrinkwrapping the s6 ecosystem as it is
today *will already be pretty hard*, and a lot, and I mean a lot, of
work is going to go into that interface. And adding more moving parts
in the engine will require even more work for the interface to control
those moving parts. We're reaching levels of kitchensinkery I'm not
comfortable with.

  In the end, what risks happening? A neat, slick, thrifty engine, with
a lot of knobs, and a big fat complex interface on top of it - and
unless you're a specialist, you *need* the interface, because there are
so many knobs that you otherwise need a degree to understand what
everything does. And what good is it to have such a satisfying engine
if you can't use it without a thick layer of bloat?

  So, I think my software design needs to be rebalanced, and complexity
needs to be spread more evenly. I'm certainly not going to write
monoprocess behemoths, that's not what I do, but I need to stop yolo
adding small binaries to address some functionality and say "there you
go, here's the mechanism, how to use it is left as an exercise to the
reader." Which is exactly what would happen with s6-rc-eventd.

  So, option 2 is to take a step back and say: a service manager is one
(complex) functionality, and if I want a full-fledged service manager,
I need to design it as such from the beginning, instead of having a
static service manager with a program to handle dynamic stuff added next
to it as an afterthought and the complexity needing to be managed by
users or by s6-frontend.
  And that means a s6-rc redesign.

  I haven't made a decision yet: I'm in the process of *exploring* what
a s6-rc redesign would look like. But so far, this is what I think a
full service manager should do:

  - Be similar in concept to Upstart. The Upstart implementation is bad,
but the fundational ideas are actually quite good.
    * That means: event-based, transitions are triggered by events, and
events can have several sources: a transition finishing, but also
external events such as ones coming from a network manager, or internal
events coming from the daemon itself (this is necessary for Upstart
because it's an init system, I don't think it is necessary for a pure
service manager).
  - Perform as much static analysis and upfront checking as possible,
just
like the current s6-rc. I would like to keep the same level of
guarantees
for a fully static set of services, and ideally be able to offer some
guarantees as well for dynamic ones, although it's obviously impossible
to do a full analysis for them.
  - Support disjunctions in service trigger conditions! If I'm going to
rewrite the engine, might as well allow for alternatives without forcing
the user to recompile a database.
  - Support instances. After a lot of brainstorming and several attempts,
I've been unable to find a good way to add instantiation to the current
s6-rc model. If we want instantiation, it definitely needs to be a part
of service manager design from the start, so this would be the
opportunity.

  So here you are. In the weeks to come, I'll keep thinking about the
details of option 2, and build an outline of the various necessary
parts.
And eventually, if I think I can write this, with all the functionality,
while still sticking to my standards of code simplicity, then it's
what I'll do. If not, and in particular, if I can't get all the static
analysis guarantees that I want, then I'll just go with option 1, which
will do a decent job for a lot less work but will definitely not help
the perception of the s6 ecosystem by normal people.

--
  Laurent

Received on Tue Sep 01 2020 - 10:00:22 UTC

This archive was generated by hypermail 2.3.0 : Sun May 09 2021 - 19:44:19 UTC