Re: single-binary for execline programs? from Dominique Martinet on 2023-02-01 (skaware)

From: Dominique Martinet <asmadeus_at_codewreck.org>
Date: Wed, 1 Feb 2023 21:28:36 +0900

Laurent Bercot wrote on Wed, Feb 01, 2023 at 10:41:39AM +0000:
> > I'd go out a limb and say if you only support single-binary mode, some
> > of the code could be simplified further by sharing some argument
> > handling, but it's hard to do simpler than your exlsn_main wrapper so
> > it'll likely be identical with individual programs not changing at all,
> > with just an extra shim to wrap them all; it's not like busybox where
> > individual binaries can be selected so a static wrapper would be dead
> > simple.
>
> I doubt much sharing would be possible.
>
> The main problem I have with multicall is that the program's
> functionality changes depending on argv[0]. You need to first select
> on argv[0], and *then* you can parse options and handle arguments.
> Note that each exlsn_* function needs its own call to subgetopt_r(),
> despite the options being very similar because they all fill an
> eltransforminfo_t structure.

Yes, as I've mentioned you've already done a great job at sharing as
much as possible.

I'm not expecting any change to the program.

Look, here's a trivial, suboptimal wrapper, far from pretty:

$ cd execline; ./configure
$ cd src
$ grep -l 'int main' */*.c | while read f; do
        b=${f##*/}; b=${b%.c}; b=${b//-/_}
        sed -i -e 's/int main/int main_'"$b"'/' "$f"
        echo "$b" >> programs
done

$ {
  cat <<EOF
#include <string.h>

int main (int argc, const char **argv, char const *const *envp)
{
   const char *app = strrchr(argv[0], '/');
   if (app) app++; else app=argv[0];
#define APP(name) if (strcmp(app, #name) == 0) return main_##name(argc, argv, envp);
EOF
  sed -e 's/.*/APP(&)/' < programs
  cat <<EOF
  return 111; // not found
}
EOF
} > wrapper.c
$ gcc -O2 -I$PWD/include -I$PWD/include-local \
        */*.c wrapper.c -o wrapper \
        -lskarnet -Wno-implicit-function-declaration
$ ln wrapper execline_cd
$ ln wrapper if
$ ./if true '' ./execline_cd / ls
</ content>

(look, I said it wasn't pretty -- there are at least a dozen of problems
with this, but nothing a day of work I offered to do can't fix; I wrote
this because it was faster than talking without a concrete example to
have some figures, and that took me less time than the rest of this mail)

$ size wrapper
   text data bss dec hex filename
  97167 2860 1136 101163 18b2b wrapper (glibc)
  98529 2836 1264 102629 190e5 wrapper (musl)

> Having a shim over *all* the execline programs would be that,
> multiplied by the number of programs; at the source level, there would
> not be any significant refactoring, because each program is pretty much
> its own thing. An executable is its own atomic unit, more or less.
>
> If anything, execline is the package that's the *least* adapted to
> multicall because of this. There is no possible sharing between
> "if" and "piperw", for instance, because these are two small units with
> very distinct functionality. The only way to make execline suited to
> multicall would be to entirely refactor the code of the executables and
> make a giant library, à la busybox. And I am familiar enough with
> analyzing and patching busybox that I certainly do not want to add that
> kind of maintenance nightmare to execline.

I don't think any more refactoring would be useful, I don't see the
problem of looking at argv[0] first independantly... And gcc still found
quite a bit to share as the sum of all text segments of all binaries
goes to ~235000; many binaries really do sum up.
(And that's before ELF/ld overhead)

> Anything that can be shared in execline is pretty much already shared
> in libexecline. If you build execline with full shared libraries, you
> get as much code sharing as is reasonably accessible without a complete
> rearchitecture.

libexecline is statically linked, so these pages aren't shared afaik?

My understanding is that if any symbol from a compilation unit (a .lo in
the .a) are used, the whole unit is going to be duplicated there, and
runtime has no way of figuring that out.
Of course, C runtime also probably amounts for a part of that
difference.

> The "one unique binary" argument applies better to some of my other
> software; for instance, the latest s6-instance-* additions to s6.
> I considered making a unique "s6-instance" binary, with varying
> functionality depending on an argv[1] subcommand; I eventually decided
> against it because it would have broken UI consistency with the rest of
> s6, but it would have been a reasonable choice for this set of programs -
> which are already thin wrappers around library calls and share a lot
> of code. Same thing with s6-fdholder-*.
> execline binaries, by contrast, are all over the place, and *not* good
> candidates for multicall.

I really don't see what's different between e.g. execline and coreutils,
who apparently thought it was worth it; but, sure, there are other
targets (and as said below some that you aren't working on)

Then again, a multicall coreutils does not seem to care about data/bss:
   text data bss dec hex filename
1208372 47092 87368 1342832 147d70 coreutils (glibc, nixos)
1088493 59104 83856 1231453 12ca5d coreutils (musl, alpine)

Well, they save ~3MB of text that way, so it's one order of magnitude
ahead of this discussion...

> > Hmm, I'd need to do some measurements, but my impression would be that
> > since the overall size is smaller it should pay off for any pipeline
> > calling more than a handful of binaries, as you'll benefit from running
> > the same binary multiple times rather than having to look through
> > multiple binaries (even without optimizing the execs out).
>
> Yes, you might win a few pages by sharing the text, but I'm more
> concerned about bss and data. Although I take some care in minimizing
> globals, I know that in my typical small programs, it won't matter if
> I add an int global, because the amount of global data I need will
> never reach 4k, so it won't map an extra page.
>
> When you start aggregating applets, the cost of globals skyrockets.
> You need to pay extra attention to every piece of data. Let me bring
> the example of busybox again: vda, the maintainer, does an excellent
> job of keeping the bss/data overhead low (only 2 pages of global
> private/dirty), but that's at the price of keeping it front and
> center, always, when reviewing and merging patches, and nacking stuff
> that would otherwise be a significant improvement. It's *hard*, and
> hampers code agility in a serious way. I don't want that.

Right, we're just above 4k with data+bss on musl; given execline
processes are mostly short lived I'd think it's not worth the effort you
describe, but I guess you won't like that :)

> Sure, you can say that globals are a bad idea anyway, but a lot of
> programs need *some* state, if local to a TU - and the C and ELF models
> make it so that TU-local variables still end up in the global data
> section.

I can definitely agree it's not always practical to pass a variable
around to every single function...

> > Even almost 1MB (the x86_64 version that doesn't have the problem,
> > package currently 852KB installed size + filesystem overhead..) is
> > still something I consider big for the systems I'm building, even
> > without the binutils issue it's getting harder to fit in a complete
> > rootfs in 100MB.
>
> I will never understand how disk space is an issue for execline and s6.
> RAM absolutely is, because RAM is expensive.
> CPU absolutely is, because it uses power.
> But disk space?
>
> I have 128 GB on my goddamn phone and 1 TB on my goddamn router.
> People build Electron apps with a full web engine embedded in it and
> ship them like candy.
> Projects on GitHub vendor dependencies like their lives depend on it
> and ship 100 MB executables.
> Yesterday I built nftables, yes, the thing that's supposed to be super
> low-level, super close to the Linux kernel, and easily installable on
> anything that needs packet filtering and network control, so, routers,
> supposedly relatively small machines. Well, the static nftables binary,
> linked against musl, is 1.2 MB large.
>
> wat.
>
> A dynamically linked bash binary is 1.2 MB as well.
>
> wat.

Can't agree more than I do here...

> Do you honestly find it fair to ask me to jump through hoops so that
> all the execline binaries together may use less than 1 MB?

I thought it was possible and offered to do it -- and I'm perfectly fine
with a no, your first answer just looked like you were overthinking what
I tried to do.
I'm not saying you have to do it my way either, nor have to do anything,
so I don't think there's anything unfair in here.

But yes, there's only so much time, and even if I or someone else do it
it'll take your time for review/future updates, so I can accept that.

> Isn't there anywhere else on your system where the fruit is lower
> hanging, and you could have better gains with less effort? In particular,
> effort that *I* wouldn't have to do?

Well, I have that same time problem, so there's only so many places I
can ask -- but if that helps I've asked the same thing of some container
plugins (and didn't get any answer so that ended up as a local fork...),
so if that helps I'm already quite grateful you went out of your way so
much.
I'd advise not discussing so much with strangers as this thread probably
already made you lose more time than it'd have taken to do it :-D

> > Just looking at the s6 suite (s6, s6-rc, execline, skalibs,
> > s6-linux-init) I'm looking at a 3MB increase (because I won't be able to
> > get rid of openrc for compatibility with user scripts it'll have to live
> > in compat hooks...) ; being able to shave ~700KB of that would be
> > very interesting for me (number from linking all .c together with a
> > dummy main wrapper, down 148KB)
>
> Have you measured how much disk space OpenRC uses? (I'm not even
> talking about RAM usage here, it's too easy a target. :P)

I had a look when starting this thread, and yes they're definitely
another candidate; binaries like service_{started,starting,stopped} for
example all are compiled from the same source and would be a perfect
target for this kind of work.

But, unfortunately for you, the full openrc suite is 2.2MB (5 on arm
with bloated aarch64), which is a bit less than the s6 suite :-D

A more practical reason is also just that I happened to look this way as
adding binaries get more scrutinity than whatever was in the system in
the first place, but since you mentioned nftables earlier their hundred
of libxt_<xtable>.so is another project I've meant to look closer...

> Have you asked the OpenRC people to modify their software so it uses
> less disk space?
> Why not?
> Why is it more acceptable to ask that of s6?

Why does it have to be one or the other?

> I understand that s6 is supposed to be better than everything else,
> yes, but in order to achieve that, it needs to pick its fights. I have
> chosen to prioritize correctness, CPU usage, RAM usage, automatability,
> and a few other battlefields; disk space is in the list, but far behind.
> It makes things more difficult for you, and I'm sorry, but I don't
> think my choice is wrong.

No need to be sorry.

> > (s6-* dozen of binaries being another similar target and would shave a
> > bit more as well, build systems being similar I was hoping it could go
> > next if this had been well received)
>
> The problem here is that there are conflicting wants.
>
> People who build embedded devices - like you, I assume - need smaller
> software, possibly at the expense of functionality, so they want finer
> granularity. That's my natural inclination, too.
>
> But most other users don't care so much about size, and they want
> simplicity and usability, which mean coarser granularity. A normie doesn't
> understand the difference between s6, s6-rc and s6-linux-init, and
> shouldn't have to; for them, an integrated init system doing all that
> would be better.
>
> Distributions cater to the latter, in general, so since my long-term
> goal is to get the s6 ecosystem better integrated in distributions, I
> tend to not care as much about fine granularity as I did in past years.
> That's why I don't have many scruples adding extra s6-* programs to the
> s6 package. But there are distinct thematic groups of s6 programs that
> work together, and if you want to do away with one group, you can
> do it without impacting the rest too much.

Just to be clear, I'm not asking to cut anything down here; just
optimizing what's there.

The first objective for s6 should definitely be to get more user, and
that comes with getting distribution support -- s6-frontend and whatever
else that can be needed for it are definitely a priority.

(cutting the last part of the thread as it's too far off topic, and I
mostly agree anyway)

-- 
Dominique

Received on Wed Feb 01 2023 - 13:28:36 CET

This archive was generated by hypermail 2.4.0 : Wed Feb 01 2023 - 13:29:27 CET