This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Should glibc provide a builtin C.UTF-8 locale?


On Wed, Feb 11, 2015 at 03:27:10PM -0500, Carlos O'Donell wrote:
> > If you do go with the "two projects" approach, perhaps you could aim
> > to have them both take place in the same release cycle, or else to
> > have "built-in C.UTF-8 locale" be an optional feature until it's
> > optimized in a subsequent release.
> 
> No, you raise a very good point. I'll make sure they go forward as one
> project with an analysis phase that requires looking at how to reduce
> the tables sizes.

Some preliminary remarks on this:

For character classes, basic two-level tables indexed by x>>8 and
x&255 work wonders; the second-level tables can be shared between many
blocks. But in order for this sharing to work best, you need separate
1-bit-per-char tables for each property rather than having all
properties together.

Some of the classes like digit/xdigit can be hard-coded not to need
tables at all; this should be a performance optimization too.

Case mapping could probably also be done with two-level tables. I
think it would be considerably larger than property bits, but probably
still under 64k for each direction.

My rough educated guess is that you could get the whole thing in ~160k
without compromising performance.

Also it's conceivable that locales that agree with C.UTF-8 on
character classification and case mappings could use the built-in data
rather than requiring data in the locale database. This could greatly
reduce the installed size of the latter and might improve performance.

Rich


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]