This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: support for ISO C 99 format string directive macros in gettext


Paul Eggert writes:
> I'd like to avoid malloc in gettext, because I don't want error
> messages to come out in English when programs are reporting malloc
> failures.  However, I guess that if the mallocs are all done when then
> message libraries are loaded

Yes, that's the case. If a malloc fails while loading a message
catalog, the catalog is simply ignored, and the messages come out
untranslated - this indeed saves memory when memory pressure is
critical.

> > (Note that you have to have a writable copy of the hash table in
> > either case - it doesn't make a difference whether the PROT_WRITE or
> > the malloc approach is taken.)
> 
> Can't we make the hash table read-only by omitting the
> system-dependent part of the strings from the input to hashing
> function?

How could the hash function tell where the system dependent part
begins? The argument to gettext() is a plain string like
  "The file has %08llx bytes."
There is no marker telling that the "llx" portion is the system
dependent one.

> Another possibility would be to use mmap on systems like glibc
> (and the vast majority of other systems) where strlen (PRI...) <= 3,
> and to fall back on malloc on weird systems with longer
> PRI... strings.

It still has the drawback to allocate 4 KB of memory when only a
single small system dependent string is needed. malloc is cheaper.

> Admittedly this would complicate the code, but I think it might
> improve performance on GNU systems by putting less stress on malloc.

It's more stress for the system to do copy-on-write of 1 page than to
malloc 100 bytes.

> > > I'd rather write this:
> > > 
> > >      printf (_("total = %jd bytes"), total);
> > > 
> > >   than this:
> > > 
> > >      printf (_("total = %" PRIdMAX " bytes"), total);
> > > 
> > >   Can this be arranged?  It'd be nice.
> > 
> > Yes it would be nice. But that's not how the standards (ISO C 99 +
> > POSIX 2001) specify it.
> 
> Sorry, I don't understand this comment.

"%jd" == PRIdMAX is an exception. If you suggest to reserve new
lowercase letters for each of _32, _64, _LEAST32, _LEAST64, _FAST32,
_FAST64, _PTR etc. the standard makers would run out of lowercase
letters.

> Under this proposal, _ would not be a noop when NLS is disabled.

This would hinder the acceptance of gettext in some areas.

> I do have one other question.  What will xgettext etc. do when
> message-IDs collide after replacement of system-dependent strings?
> For example, suppose the program does this:
> 
> 	printf (_("total = %" PRIdMAX " bytes"), total_max);
> 	printf (_("total = %" PRId64  " bytes"), total_64);

The translator will have provided two translations, one containing "%"
PRIdMAX and the other one containing "%" PRId64. (msgfmt complains if
the translator provides wrong format directives.) On platforms with
PRIdMAX == PRId64 both msgids are identical; gettext will choose
either of these translation. It's an arbitrary choice.

> Will xgettext warn about potential collisions like this?

It could be msgfmt's task to warn if the translator has provided
significantly different translations for potential collisions.

Bruno


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]