This is the mail archive of the libc-help@sourceware.org mailing list for the glibc project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
On 03 Jul 2015 15:16, Craig Ringer wrote: > The PostgreSQL database relies on the collation support of the > underlying platform, which in GNU/Linux is glibc. This works very well > for most purposes, but a problem arises when the collation rules are > updated by the platform due to bug fixes or changes in accepted > language rules. > > PostgreSQL builds persistent on-disk b-tree indexes by executing the > system C library collation functions - strcoll or strcoll_l. Correct > searching of these indexes requires that the C library collation > function behaviour be pure and immutable, i.e. that any two calls over > any time period will return the same result for any given input. > Collation updates break that assumption, and indexes must be rebuilt > (REINDEXed) to ensure correct queries. > > If PostgreSQL had a way to detect when the collation definition an > index was built with differed from the current collation definition it > would be very helpful, as we could then alert users to the situation, > or even repair the index if we could tell *what* changed, not just > that something changed. i don't know about a portable answer, but perhaps extending nl_langinfo would be more on the painless side of things ? adding a GNU-specific keyword that'd return a hash of the collation data so you could easily check. </naive> > This isn't only an issue with collation updates on one machine. It > also applies when a database is binary-replicated to another host with > a different glibc version. Queries on the replica may produce > incorrect results if the collations differ, and currently we have no > way to detect this situation. what about binary replications between OS's or different C libraries ? or is that not supported ? > The alternative to detecting and reporting issues with platform > collation changes is dropping the use of operating system collation > support in favour of a portable library like ICU. That's undesirable > for a number of reasons: ICU uses UTF-16 internally while PostgreSQL > uses UTF-8, so there'd be ugly conversion overheads, and that's just > one of the issues. It'd also potentially cause PostgreSQL's collation > results to differ from that of the platform it runs on. I'd rather > avoid that, so I'm really interested in a way to find out when glibc > collations change, or even better a portable way to do it and possibly > even derive what changed. how would ICU help you determine when collation data updates ? ICU too sees updates to its collation database that you'd need to detect at runtime. -mike
Attachment:
signature.asc
Description: Digital signature
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |