Re: Query on s6-log and s6-supervise

From: Arjun D R <drarjun95_at_gmail.com>
Date: Thu, 10 Jun 2021 09:24:16 +0530

Thanks Laurent and Colin for the suggestions. I will try to build a fully
static s6 with musl toolchain. Thanks for the detailed analysis
once again.
--
Arjun
On Wed, Jun 9, 2021 at 5:18 PM Laurent Bercot <ska-supervision_at_skarnet.org>
wrote:
> >I have checked the Private_Dirty memory in "smaps" of a s6-supervise
> >process and I don't see any consuming above 8kB. Just posting it here
> >for reference.
>
>   Indeed, each mapping is small, but you have *a lot* of them. The
> sum of all the Private_Dirty in your mappings, that should be shown
> in smaps_rollup, is 96 kB. 24 pages! That is _huge_.
>
>   In this list, the mappings that are really used by s6-supervise (i.e.
> the incompressible amount of unshareable memory) are the following:
>
>   - the /bin/s6-supervise section: this is static data, s6-supervise
> needs a little, but it should not take more than one page.
>
>   - the [heap] section: this is dynamically allocated memory, and for
> s6-supervise it should not be bigger than 4 kB. s6-supervise does not
> allocate dynamic memory itself, the presence of a heap section is due
> to opendir() which needs dynamic buffers; the size of the buffer is
> determined by the libc, and anything more than one page is wasteful.
>
> ( - anonymous mappings are really memory dynamically allocated for
> internal  libc purposes; they do not show up in [heap] because they're
> not obtained via malloc(). No function used by s6-supervise should
> ever need those; any anonymous mapping you see is libc shenanigans
> and counts as overhead. )
>
>   - the [stack] section: this is difficult to control because the
> amount of stack a process uses depends a lot on the compiler, the
> compilation flags, etc. When built with -O2, s6-supervise should not
> use more than 2-3 pages of stack. This includes a one-page buffer to
> read from notification-fd; I can probably reduce the size of this
> buffer and make sure the amount of needed stack pages never goes
> above 2.
>
>   So in total, the incompressible amount of private mappings is 4 to 5
> pages (16 to 20 kB). All the other mappings are libc overhead.
>
>   - the libpthread-2.31.so mapping uses 8 kB
>   - the librt-2.31.so mapping uses 8 kB
>   - the libc-2.31.so mapping uses 16 kB
>   - the libskarnet.so mapping uses 12 kB
>   - ld.so, the dynamic linker itself, uses 16 kB
>   - there are 16 kB of anonymous mappings
>
>   This is some serious waste; unfortunately, it's pretty much to be
> expected from glibc, which suffers from decades of misdesign and
> tunnel vision especially where dynamic linking is concerned. We are,
> unfortunately, experiencing the consequences of technical debt.
>
>   Linking against the static version of skalibs (--enable-allstatic)
> should save you at least 12 kB (and probably 16) per instance of
> s6-supervise. You should have noticed the improvement; your amount of
> private memory should have dropped by at least 1.5MB when you switched
> to --enable-allstatic.
>   But I understand it is not enough.
>
>   Unfortunately, once you have removed the libskarnet.so mappings,
> it's basically down to the libc, and to achieve further improvements
> I have no other suggestions than to change libcs.
>
> >If possible, can you please share us a reference smap and ps_mem data on
> >s6-supervise. That would really help.
>
>   I don't use ps_mem, but here are the details of a s6-supervise process
> on the skarnet.org server. s6 is linked statically against the musl
> libc, which means:
>   - the text segments are bigger (drawback of static linking)
>   - there are fewer mappings (advantage of static linking, but even when
> you're linking dynamically against musl it maps as little as it can)
>   - the mappings have little libc overhead (advantage of musl)
>
> # cat smaps_rollup
>
> 00400000-7ffd53096000 ---p 00000000 00:00 0  [rollup]
> Rss:                  64 kB
> Pss:                  36 kB
> Pss_Anon:             20 kB
> Pss_File:             16 kB
> Pss_Shmem:             0 kB
> Shared_Clean:         40 kB
> Shared_Dirty:          0 kB
> Private_Clean:         8 kB
> Private_Dirty:        16 kB
> Referenced:           64 kB
> Anonymous:            20 kB
> LazyFree:              0 kB
> AnonHugePages:         0 kB
> ShmemPmdMapped:        0 kB
> FilePmdMapped:         0 kB
> Shared_Hugetlb:        0 kB
> Private_Hugetlb:       0 kB
> Swap:                  0 kB
> SwapPss:               0 kB
> Locked:                0 kB
>
>   You can see 40kB of shared, 16kB of Private_Dirty, and 8kB of
> Private_Clean - apparently there's one Private_Clean page of static
> data and one of stack; I have no idea what this corresponds to in the
> code, I will need to investigate and see if it can be trimmed down.
>
> # grep -E '[[:space:]](-|r)(-|w)(-|x)(-|p)[[:space:]]|^Private_Dirty:'
> smaps
>
> 00400000-00409000 r-xp 00000000 ca:00 659178  /command/s6-supervise
> Private_Dirty:         0 kB
> 00609000-0060b000 rw-p 00009000 ca:00 659178  /command/s6-supervise
> Private_Dirty:         4 kB
> 02462000-02463000 ---p 00000000 00:00 0  [heap]
> Private_Dirty:         0 kB
> 02463000-02464000 rw-p 00000000 00:00 0  [heap]
> Private_Dirty:         4 kB
> 7ffd53036000-7ffd53057000 rw-p 00000000 00:00 0  [stack]
> Private_Dirty:         8 kB
> 7ffd53090000-7ffd53094000 r--p 00000000 00:00 0  [vvar]
> Private_Dirty:         0 kB
> 7ffd53094000-7ffd53096000 r-xp 00000000 00:00 0  [vdso]
> Private_Dirty:         0 kB
>
>   One page of static data, one page of heap, two pages of stack (that
> I should probably be able to get down to one). All the other mappings
> are shared, except those weird two pages of Private_Clean that I don't
> understand yet.
>   As you can see, it is as close to incompressible as it gets. If I had
> 129 of these processes, without changing anything, they would use
> something like: (16+8) * 129 + 40 = 3136 kB of RAM. Which is still
> bigger than the theoretical minimum - I need to get rid of those two
> Private_Clean pages - but much more acceptable than the 12.2 MB you get
> from glibc.
>
>
>   I was going to post this as is, but for completion's sake and my
> peace of mind, I fired up an Alpine Linux VM and checked /proc for
> a s6-supervise process. Alpine Linux uses musl, but with dynamic
> linking, and --disable-allstatic. The results are mixed:
>
>   - 8 kB of static data (why is it more than the static case?)
>   - 4 kB of heap
>   - 8 kB of stack
>     (So far so good, more or less.)
>   - 16 kB for libskarnet.so (why is it more than glibc uses?)
>   - 8 kB of anonymous mapping related to libskarnet.so
>   - 8 kB for libc.so
>   - 8 kB of anonymous mapping related to libc.so
>
>   That's better than glibc, but is still 40kB of overhead compared to
> a static build, plus 4 kB of static data that I don't understand.
> Total is 60 kB, which would net 7.7MB + shared for 129 instances.
> Linking libskarnet statically would likely save 24kB per instance, so
> the total RAM for --enable-allstatic would be 4.6MB + shared. Which
> is starting to sound close to acceptable.
>
>   My takeaway from this is that dynamic linking, despite being essential
> for distributions (for ease of upgrade, maintenance, and security
> reasons), is definitely _not free_. It has a high fixed cost in RAM;
> this is not noticeable when using few instances of large, bloated
> processes - which is how a lot of software operates - but it is very
> noticeable when using a lot of instances of small, efficient processes,
> where the costs of dynamic linking overshadow the legit RAM use of said
> processes.
>
>   In other words: the way s6 works is a worst case for dynamic linking,
> and especially dynamic linking with glibc. I'm sorry.
>
>   If you want to attempt building static binaries of s6 with musl, you
> can find musl toolchains at https://skarnet.org/toolchains/ or
> at https://musl.cc/ . Please bear in mind you will need to build the
> whole stack with the same toolchain (skalibs, execline, s6).
>
>
>
>   Dewayne:
>
> >>  Thanks Laurent, that's really interesting.  By comparison, my FBSD
> >>  system uses:
> >>
> >>  # ps -axw -o pid,vsz,rss,time,comm | grep s6
>
>   Well that's the problem with ps: VSZ and RSS won't give you the real
> information, because they include shared mappings in their numbers.
> To get a reasonably accurate estimation of the marginal increase on
> one additional process, you need to know what is shared and what is
> private, and ps doesn't tell you that. There is probably a way to get
> the information on FreeBSD, but I don't know what it is.
>
>   Yes, the FreeBSD libc is relatively large, but it's pretty decent
> compared to glibc. I suspect the marginal increase on one s6-supervise
> process on FreeBSD is somewhere between what you get with musl and
> what you get with glibc.
>
> --
>   Laurent
>
>
Received on Thu Jun 10 2021 - 05:54:16 CEST

This archive was generated by hypermail 2.4.0 : Thu Jun 10 2021 - 05:54:58 CEST