This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Should glibc provide a builtin C.UTF-8 locale?
- From: Rich Felker <dalias at libc dot org>
- To: keld at keldix dot com
- Cc: Carlos O'Donell <carlos at redhat dot com>, GNU C Library <libc-alpha at sourceware dot org>
- Date: Thu, 12 Feb 2015 15:32:36 -0500
- Subject: Re: Should glibc provide a builtin C.UTF-8 locale?
- Authentication-results: sourceware.org; auth=none
- References: <54DB8243 dot 3050903 at redhat dot com> <20150211235304 dot GA20330 at www5 dot open-std dot org> <20150212023923 dot GP23507 at brightrain dot aerifal dot cx> <20150212063839 dot GA10787 at www5 dot open-std dot org> <20150212151509 dot GQ23507 at brightrain dot aerifal dot cx> <20150212192544 dot GA19724 at www5 dot open-std dot org>
On Thu, Feb 12, 2015 at 08:25:44PM +0100, keld@keldix.com wrote:
> > > > > Also a pet idea of mine is to have compressed locales - that could significantly reduce
> > > > > the disk footprint of a more complete locale database. Also good for message catalogues.
> > > >
> > > > This sounds like a bad tradeoff unless you can use the compressed data
> > > > efficiently in-place. Disk space is cheap; requiring a decompressed
> > > > copy in memory per-process rather than using a shared mapping is
> > > > expensive.
> > >
> > > Hmm, are you referring to a statically linked version in glibc when you talk about
> > > a shared mapping?
> > >
> > > I do not see the big difference between loading an uncompressed locale and loading
> > > a compressed locale into memory, it may even be faster to read the compressed data
> > > and uncompress it. Or what?
> > >
> > > Message catalogues may be huge, especially if you want to carry them all.
> >
> > The difference with the uncompressed locale archive is that it's NOT
> > loaded into memory, it's mmapped, just like executables and shared
> > libraries are. This means that only the used parts are ever resident
> > in memory at all, they're discardable (subject to reloading later on
> > the next access) just like anything else in the filesystem cache, and
> > shared by all processes using glibc.
>
> I see. Are message catalogues also mmapped?
Yes, generally. There's a newish, mostly undocumented, and IMO highly
misdesigned "sysdep strings" feature that requires allocating new
strings (usually a small number) each time the catalogue is opened,
but the vast majority of message strings are mmapped.
Rich