This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC][BZ #16009] Memory handling in strxfrm_l()



No, the function is not permitted to return an error; it's required by
ISO C to produce a result. Falsely reporting that it needs more space
for the result, and thereby causing the caller to keep allocating
larger and larger buffers until it runs out of memory itself, is not
valid; in particular, it could report different needed lengths for the
same string at different calls in the ame program with the same
locale.

If strcoll_l is using an algorithm that requires allocation, this
needs to be fixed -- there's no fundamental reason it needs to
allocate.


Ok. It is no big deal to add a none-allocating path but the question
than is when to use it. We could stick to the current implementation
and just try to malloc() if the stack is not available but
personally I would not want strxfrm to even try to allocate memory
beyond a certain amount. Considering that __MAX_ALLOCA_CUTOFF is
actually 64KB so that strings up to 12.8KB could have a stack based
index & rules cache one could maybe avoid malloc() at all without
hurting most real world use cases.


You could also only cache last 16k characters on stack and if function
goes beyond that then recompute these / switch to uncached version.


Thank you all for the feedback. There are two things I overlooked: strxfrm needs to compute the whole src string because it has to return the needed dest length in any case and the weight-indices-cache is modified while traversing the string. So it's not possible to use a sliding-window-approach or restrict the cache size based on dest length.

I also agree that strxfrm is a function for pre-computing things that need to be fast somewhere else, so performance has not the highest priority. Anyway, the "faster" approach is implemented so why not reuse it.

My proposal now is the following:

* allocate a fixed size cache array on the stack (e.g. 20kb supporting strings up to 4000 characters) * fill it with values until either the end of the string is reached or the cache is full
* go with the cached version if end of string is reached
* go with the uncached version if not

This avoids strlen() + malloc() and is "fast" for standard real world issues like word sorting and solid for large strings.

Leonhard


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]