This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: strxfrm output stability


On 09/09/2015 09:54 AM, Zack Weinberg wrote:
https://lwn.net/Articles/653411/  indicates that PostgreSQL's use of
strxfrm is new in 9.5

Ah, I was looking at the latest stable PostgreSQL release (9.4.4), so it sounds like they've cleaned up the strxfrm-using code. Yet another reason for me to rethink my position that strxfrm is silly.

When I tried a decade ago to use strxfrm in GNU sort to speed it up, I ran into two problems. First, glibc strxfrm generates lengthy output, and since I wanted only the first 8 bytes or so I got relatively little information out of the prefix and overall it was a performance loss on my benchmarks. As I understand it the new PostgreSQL code works around this problem by doing a collation-preserving compression of the strxfrm output. It would be better, I expect, if GNU libc strxfrm did that compression itself rather than require every caller who cares about efficiency to do it.

The second problem was more of an irritation: I wanted the first (say) 8 bytes of strxfrm output, but there's no standard way of getting them without generating all the strxfrm output bytes and then discarding the unwanted trailing bytes, which is slow. (I hacked around this problem by asking for the first 8+DELTA bytes, where DELTA is adjusted so that it works in practice, but this is clearly an unportable-in-theory hack.) It would be nicer if glibc strxfrm would guarantee to fill up its output buffer even when truncated.

Perhaps if I stold the idea of collation-preserving compression from PostgreSQL and used that idea in GNU sort, there would be enough performance gain to make strxfrm worthwhile there. Or perhaps we could fix glibc strxfrm to perform better as described above -- though this would mean strxfrm's output wouldn't be stable from the current glibc release to the next one.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]