Re: single-binary for execline programs? from Laurent Bercot on 2023-02-01 (skaware)

From: Laurent Bercot <ska-skaware_at_skarnet.org>
Date: Wed, 01 Feb 2023 04:49:47 +0000

>In particular there's a "feature" with recent binutils that makes every
>binary be at least 64KB on arm/aarch64[1], so the execline package is a
>whopping 3.41MB[2] there (... and still 852KB on x86_64[3]) -- whereas
>just doing a dummy sed to avoid conflict on main and bundling all .c
>together in a single binary yields just 148KB (x86_64 but should be
>similar on all archs -- we're talking x20 bloat from aarch64/armv7
>sizes! Precious memory and disk space!)
> (...)
>It should be fairly easy to do something like coreutils'
>--enable-single-binary without much modification

  The subject has come up a few times recently, so, at the risk of being
blunt, I will make it very clear and definitive, for future reference:

  No. It will not happen.

  The fact that toolchains are becoming worse and worse is not imputable
to execline, or to the way I write or package software. It has always
been possible, and reasonable, to provide a lot of small binaries.
Building a binary is not inherently more complicated today than it was
20 years ago. There is no fundamental reason why this should change; the
only reason why people are even thinking this is that there is an
implicit assumption that software always becomes better with time, and
using the latest versions is always a good idea. I am guilty of this
too.

  This assumption is true when it comes to bugs, but it becomes false if
the main functionality of a project is impacted.
  If a newer version of binutils is unable to produce reasonably small
binaries, to the point that it incites software developers to change
their packaging to accommodate the tool, then it's not an improvement,
it's a recession. And the place to fix it is binutils.
  The tooling should be at the service of programmers, not the other way
around.

  It is a similar issue when glibc makes it expensive in terms of RAM to
run a large number of copies of the same process. Linux, like other
Unix-like kernels, is very efficient at this, and shares everything that
can be shared, but glibc performs *a lot* of private mappings that incur
considerable overhead. (See the thread around this message:
https://skarnet.org/lists/supervision/2804.html
for an example.)
  Does that mean that running 100 copies of the same binary is a bad
model? No, it just means that glibc is terrible at that and needs
improvement.

  Back in the day when Solaris was relevant, it had an incredibly
expensive implementation of fork(), which made it difficult, especially
with the processing power of 1990s-era Sun hardware, to write servers
that forked and still served a reasonable number of connections.
It led to emerging "good practices", that were taught by my (otherwise
wonderful) C/Unix programming teacher, and that were: fork as little as
possible, use a single process to do everything. And that's how most
userspace on Solaris worked indeed.
  It did a lot of harm to the ecosystem, turning programs into giant
messes because people did not want to use the primitives that were
available to them for fear of inefficiency, and jumping through hoops
to work around it at the expense of maintainability.
  Switching to Linux and its efficient fork() was a relief.

  Multicall binaries have costs, mostly maintainability costs.
Switching from a multiple binaries model to a multicall binary model
because the tooling is making the multiple binaries model unusably
expensive is basically moving the burden from the tooling to the
maintainer. Here's a worse tool, do more effort to accommodate it!

  Additionally to maintainability costs, multicall binaries also have a
small cost in CPU usage (binary starting time) and RAM usage (larger
mappings, fewer memory optimizations) compared to multiple binaries.
These costs are paid not by the maintainer, but by the users.
Everyone loses.

  Well, no. If having a bunch of execline binaries becomes more expensive
in disk space because of an "upgrade" in binutils, that is a binutils
problem, and the place to fix it is binutils.

>In the long run this could also provide a workaround for conflicting
>names, cf. old 2016 thread[4], if we'd prefer either running the
>appropriate main directly or re-exec'ing into the current binary after
>setting argv[0] appropriately for "builtins".

  There have been no conflicts since "import". I do not expect more name
conflicts in the future, and in any case, that is not an issue that
multicall binaries can solve any better than multiple binaries. These
are completely orthogonal things.

>(I assume you wouldn't like the idea of not installing the individual
>commands, but that'd become a possibility as well. I'm personally a bit
>uncomfortable having something in $PATH for 'if' and other commands that
>have historically been shell builtins, but have a different usage for
>execline...)

  You're not the only one who is uncomfortable with it, but it's really a
perception thing. There has never been a problem caused by it. Shells
don't get confused. External tools don't get confused. On this aspect,
Unix is a lot more correct and resilient than you give it credit for. :)

--
  Laurent

Received on Wed Feb 01 2023 - 05:49:47 CET

This archive was generated by hypermail 2.4.0 : Wed Feb 01 2023 - 05:50:16 CET