This is the mail archive of the
libc-alpha@sources.redhat.com
mailing list for the glibc project.
Re: support for ISO C 99 format string directive macros in gettext
Paul Eggert writes:
> I'd like to avoid malloc in gettext, because I don't want error
> messages to come out in English when programs are reporting malloc
> failures. However, I guess that if the mallocs are all done when then
> message libraries are loaded
Yes, that's the case. If a malloc fails while loading a message
catalog, the catalog is simply ignored, and the messages come out
untranslated - this indeed saves memory when memory pressure is
critical.
> > (Note that you have to have a writable copy of the hash table in
> > either case - it doesn't make a difference whether the PROT_WRITE or
> > the malloc approach is taken.)
>
> Can't we make the hash table read-only by omitting the
> system-dependent part of the strings from the input to hashing
> function?
How could the hash function tell where the system dependent part
begins? The argument to gettext() is a plain string like
"The file has %08llx bytes."
There is no marker telling that the "llx" portion is the system
dependent one.
> Another possibility would be to use mmap on systems like glibc
> (and the vast majority of other systems) where strlen (PRI...) <= 3,
> and to fall back on malloc on weird systems with longer
> PRI... strings.
It still has the drawback to allocate 4 KB of memory when only a
single small system dependent string is needed. malloc is cheaper.
> Admittedly this would complicate the code, but I think it might
> improve performance on GNU systems by putting less stress on malloc.
It's more stress for the system to do copy-on-write of 1 page than to
malloc 100 bytes.
> > > I'd rather write this:
> > >
> > > printf (_("total = %jd bytes"), total);
> > >
> > > than this:
> > >
> > > printf (_("total = %" PRIdMAX " bytes"), total);
> > >
> > > Can this be arranged? It'd be nice.
> >
> > Yes it would be nice. But that's not how the standards (ISO C 99 +
> > POSIX 2001) specify it.
>
> Sorry, I don't understand this comment.
"%jd" == PRIdMAX is an exception. If you suggest to reserve new
lowercase letters for each of _32, _64, _LEAST32, _LEAST64, _FAST32,
_FAST64, _PTR etc. the standard makers would run out of lowercase
letters.
> Under this proposal, _ would not be a noop when NLS is disabled.
This would hinder the acceptance of gettext in some areas.
> I do have one other question. What will xgettext etc. do when
> message-IDs collide after replacement of system-dependent strings?
> For example, suppose the program does this:
>
> printf (_("total = %" PRIdMAX " bytes"), total_max);
> printf (_("total = %" PRId64 " bytes"), total_64);
The translator will have provided two translations, one containing "%"
PRIdMAX and the other one containing "%" PRId64. (msgfmt complains if
the translator provides wrong format directives.) On platforms with
PRIdMAX == PRId64 both msgids are identical; gettext will choose
either of these translation. It's an arbitrary choice.
> Will xgettext warn about potential collisions like this?
It could be msgfmt's task to warn if the translator has provided
significantly different translations for potential collisions.
Bruno