This is the mail archive of the
libc-help@sourceware.org
mailing list for the glibc project.
Re: Identifying when collations change
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Craig Ringer <craig at 2ndquadrant dot com>, libc-help at sourceware dot org
- Date: Thu, 9 Jul 2015 10:15:27 +0200
- Subject: Re: Identifying when collations change
- Authentication-results: sourceware.org; auth=none
- References: <CAMsr+YFBNJrHRFd3BxUr7We1+iz2FP7hpGZWbuL8rucGYY29wg at mail dot gmail dot com> <20150708061040 dot GQ17734 at vapier>
On Wed, Jul 08, 2015 at 02:10:40AM -0400, Mike Frysinger wrote:
> On 03 Jul 2015 15:16, Craig Ringer wrote:
> > The PostgreSQL database relies on the collation support of the
> > underlying platform, which in GNU/Linux is glibc. This works very well
> > for most purposes, but a problem arises when the collation rules are
> > updated by the platform due to bug fixes or changes in accepted
> > language rules.
> >
> > PostgreSQL builds persistent on-disk b-tree indexes by executing the
> > system C library collation functions - strcoll or strcoll_l. Correct
> > searching of these indexes requires that the C library collation
> > function behaviour be pure and immutable, i.e. that any two calls over
> > any time period will return the same result for any given input.
> > Collation updates break that assumption, and indexes must be rebuilt
> > (REINDEXed) to ensure correct queries.
> >
> > If PostgreSQL had a way to detect when the collation definition an
> > index was built with differed from the current collation definition it
> > would be very helpful, as we could then alert users to the situation,
> > or even repair the index if we could tell *what* changed, not just
> > that something changed.
>
> i don't know about a portable answer, but perhaps extending nl_langinfo would
> be more on the painless side of things ? adding a GNU-specific keyword that'd
> return a hash of the collation data so you could easily check. </naive>
>
A simple solution would be checking libc.so timestamp and reindexing when it
changes, would reindexing once per year if user regularly updates
matter?