Re: single-binary for execline programs?

From: Dominique Martinet <asmadeus_at_codewreck.org>
Date: Wed, 1 Feb 2023 14:58:00 +0900

Laurent Bercot wrote on Wed, Feb 01, 2023 at 04:49:47AM +0000:
> > It should be fairly easy to do something like coreutils'
> > --enable-single-binary without much modification
>
> The subject has come up a few times recently,

I believe I did my homework looking first -- are there other discussion
channels than this list that one should be aware of?

> so, at the risk of being
> blunt, I will make it very clear and definitive, for future reference:
>
> No. It will not happen.

Well, thanks for the clear answer, I'm glad I asked first!

I'm a sore loser though, so I'll develop a bit more below. You've
probably got better to do so feel free to just say you're not changing
your mind or pointing me at the other discussions and I'll stop bugging
you.

> The fact that toolchains are becoming worse and worse is not imputable
> to execline, or to the way I write or package software. It has always
> been possible, and reasonable, to provide a lot of small binaries.
> Building a binary is not inherently more complicated today than it was
> 20 years ago. There is no fundamental reason why this should change; the
> only reason why people are even thinking this is that there is an
> implicit assumption that software always becomes better with time, and
> using the latest versions is always a good idea. I am guilty of this
> too.
>
> This assumption is true when it comes to bugs, but it becomes false if
> the main functionality of a project is impacted.
> If a newer version of binutils is unable to produce reasonably small
> binaries, to the point that it incites software developers to change
> their packaging to accommodate the tool, then it's not an improvement,
> it's a recession. And the place to fix it is binutils.

I definitely agree with this, I reported the problem in the bz I linked,
and the reception has been rather good -- I trust we'll get back to
smaller binaries in the next version or otherwise near future.
 
> Multicall binaries have costs, mostly maintainability costs.
> Switching from a multiple binaries model to a multicall binary model
> because the tooling is making the multiple binaries model unusably
> expensive is basically moving the burden from the tooling to the
> maintainer. Here's a worse tool, do more effort to accommodate it!

I guess it isn't completely free, but it certainly isn't heavy if the
abstraction isn't done too badly.

I'd go out a limb and say if you only support single-binary mode, some
of the code could be simplified further by sharing some argument
handling, but it's hard to do simpler than your exlsn_main wrapper so
it'll likely be identical with individual programs not changing at all,
with just an extra shim to wrap them all; it's not like busybox where
individual binaries can be selected so a static wrapper would be dead
simple.

> Additionally to maintainability costs, multicall binaries also have a
> small cost in CPU usage (binary starting time) and RAM usage (larger
> mappings, fewer memory optimizations) compared to multiple binaries.
> These costs are paid not by the maintainer, but by the users.

Hmm, I'd need to do some measurements, but my impression would be that
since the overall size is smaller it should pay off for any pipeline
calling more than a handful of binaries, as you'll benefit from running
the same binary multiple times rather than having to look through
multiple binaries (even without optimizing the execs out).

Memory in particular ought to be shared for r-x pages, or there's some
problem with the system. I'm not sure if it'll lazily load only the
pages it requires for execution or if some readahead will read it all
(it probably should), but once it's read it shouldn't take space
multiple times, so multiple binaries is likely to take more space when
you include vfs cache as soon as you call a few in a row; memory usage
should be mostly identical to disk usage in practice.

Anyway, I'll concede that in doubt, let's call it a space vs. speed
tradeoff where I'm favoring space.

> Well, no. If having a bunch of execline binaries becomes more expensive
> in disk space because of an "upgrade" in binutils, that is a binutils
> problem, and the place to fix it is binutils.

I shouldn't have brought up the binutils bug.
Even almost 1MB (the x86_64 version that doesn't have the problem,
package currently 852KB installed size + filesystem overhead..) is
still something I consider big for the systems I'm building, even
without the binutils issue it's getting harder to fit in a complete
rootfs in 100MB.

Just looking at the s6 suite (s6, s6-rc, execline, skalibs,
s6-linux-init) I'm looking at a 3MB increase (because I won't be able to
get rid of openrc for compatibility with user scripts it'll have to live
in compat hooks...) ; being able to shave ~700KB of that would be
very interesting for me (number from linking all .c together with a
dummy main wrapper, down 148KB)
(s6-* dozen of binaries being another similar target and would shave a
bit more as well, build systems being similar I was hoping it could go
next if this had been well received)


> > In the long run this could also provide a workaround for conflicting
> > names, cf. old 2016 thread[4], if we'd prefer either running the
> > appropriate main directly or re-exec'ing into the current binary after
> > setting argv[0] appropriately for "builtins".
>
> There have been no conflicts since "import". I do not expect more name
> conflicts in the future, and in any case, that is not an issue that
> multicall binaries can solve any better than multiple binaries. These
> are completely orthogonal things.

It's a step further, but I don't think it's orthogonal.
If all the code is in a single binary you could have internal priorities
to builtins easily as there would be no need to mess with PATH or a
separate install prefix.


> > (I assume you wouldn't like the idea of not installing the individual
> > commands, but that'd become a possibility as well. I'm personally a bit
> > uncomfortable having something in $PATH for 'if' and other commands that
> > have historically been shell builtins, but have a different usage for
> > execline...)
>
> You're not the only one who is uncomfortable with it, but it's really a
> perception thing. There has never been a problem caused by it. Shells
> don't get confused. External tools don't get confused. On this aspect,
> Unix is a lot more correct and resilient than you give it credit for. :)

Shells and external tools would definitely be fine, they're not looking
there in the first place.
I think you're underestimating what users who haven't used a unix before
can do though; I can already picture some rummaging in /bin and
wondering why posix-cd "doesn't work" or something... We get impressive
questions sometimes.

-- 
Dominique
Received on Wed Feb 01 2023 - 06:58:00 CET

This archive was generated by hypermail 2.4.0 : Wed Feb 01 2023 - 06:58:48 CET