This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Add STATIC_GETENV macro.
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Torvald Riegel <triegel at redhat dot com>
- Cc: libc-alpha at sourceware dot org
- Date: Thu, 14 Nov 2013 13:52:53 +0100
- Subject: Re: [PATCH] Add STATIC_GETENV macro.
- Authentication-results: sourceware.org; auth=none
- References: <20131109103822 dot GA9173 at domone> <1384356858 dot 3682 dot 13273 dot camel at triegel dot csb> <20131113213313 dot GB2747 at domone dot podge> <1384424038 dot 3682 dot 13505 dot camel at triegel dot csb>
On Thu, Nov 14, 2013 at 11:13:58AM +0100, Torvald Riegel wrote:
> On Wed, 2013-11-13 at 22:33 +0100, OndÅej BÃlka wrote:
> > On Wed, Nov 13, 2013 at 04:34:18PM +0100, Torvald Riegel wrote:
> > > I don't think a macro is wise here. You seem to want to make the macro
> > > safe for use in multi-threaded programs, so this is similar to a
> > > pthread_once except that there can be more than one call to get_env (ie,
> > > the equivalent to the initialization function), but exactly on
> > > assignment.
> > > Maybe a pthread_once_store(pointer, producer) would be better (*if* we
> > > actually want to add such a feature): pointer is like p in your code,
> > > producer is a function that is called when we actually need to
> > > initialize but can be safely called more than once; we exactly-once set
> > > pointer to the value returned by one of the producer calls.
> > >
> > Yes but this does not need forcing users to link with pthread.
> >
> >
> > > Implementing this properly requires more than just a simple CAS (see
> > > pthread_once and my comments below); thus, you'll need to use proper
> > > atomics or whatever the compiler has available, which requires you to
> > > take care of all of that in the context of the client's compilation
> > > environment.
> > >
> > > On Sat, 2013-11-09 at 11:38 +0100, OndÅej BÃlka wrote:
> > > > +#define STATIC_GETENV(c) \
> > > > +({ \
> > > > + static char *__p = (char *) &__p; \
> > > > + char *__new = __p; \
> > > > + if (__p == (char *) &__p) \
> > >
> > There should be
> > if (__new == (char *) &__p)
>
> Even with that change, the load of p needs to be atomic with
> acquire/consume memory order.
>
We should support older gcc that do not support C11 atomics.
Also when you are concerned about machines that do issue nonatomic reads
then relaxed model is appropriate, acquire/consume could yield unacceptable
performance overhead.
> > > This is a data race (at least in C11, which has a concept of threads).
> > > You need to use an atomic access, and it must have acquire memory order.
> > > Consume memory order could also be sufficient, I believe.
>
> Like it or not, this is a data race according to C11 and C++11. You
> offer this using a macro, so it becomes part of the client's code. Data
> race detectors will find this race condition.
>
This is valid comment, do race detect report also race for relaxed
loads?
When you use a race detector it would also uncover a race in fnmatch and
other libc races.
> > You do not need atomic access for correctness, worst that can happen is
> > performance degradation by repeately recalculating value.
>
> No. Please understand the C11/C++11 memory model. The compiler could
> choose to read p nonatomically, or read it a couple of times and assume
> that the value won't change, speculating that it is always like in the
> second of two accesses.
>
A default pthread_once implementation is following
int
__pthread_once (once_control, init_routine)
pthread_once_t *once_control;
void (*init_routine) (void);
{
/* XXX Depending on whether the LOCK_IN_ONCE_T is defined use a
global lock variable or one which is part of the pthread_once_t
object. */
if (*once_control == PTHREAD_ONCE_INIT)
{
lll_lock (once_lock, LLL_PRIVATE);
/* XXX This implementation is not complete. It doesn't take
cancelation and fork into account. */
if (*once_control == PTHREAD_ONCE_INIT)
{
init_routine ();
*once_control = !PTHREAD_ONCE_INIT;
}
lll_unlock (once_lock, LLL_PRIVATE);
}
return 0;
A type of pthread_once_t is int so compiler could read once_control
nonatomicaly. According to your argument this code is broken so file a
bug report and send a patch to fix this serious issue.
> Furthermore, read about weak memory models. We need acquire/consume
> memory order to ensure that accesses to *p happen after p* has been
> initialized (and likewise for release on the assignment to p).
>
I know them and
> If you can show that in this particular case, initialization of *p will
> happen before every thread can use STATIC_GETENV, then you can do
> without acquire/release -- but then you need to add a comment why this
> is the case and still use relaxed memory order atomic accesses (also on
> the CAS to avoid the barrier overhead, the __sync CAS has mostly seq-cst
> memory order).
>
Using a static initializer is a common pattern used in nptl and we do not
need comment for obvious cases.
> > Unless compiler decides to inline functions so that it accesses variable
> > in loop which when called first time would cause repeated
> > recomputations. A solution would be mark variable volatile.
>
> Volatile is not a sufficient replacement for atomic accesses because it
> tells the compiler that for such variables, accesses should happen
> exactly as if the abstract machine would run the program -- but this
> doesn't necessarily mean they should be atomic wrt. accesses in other
> threads. And you also cannot establish inter-thread happens-before
> relations with just volatiles (for which you need the memory-orders
> and/or the total orders the memory model defines for seq-cst actions).
>
This relationship is not necessary.
> > >
> > > > + { \
> > > > + __new = getenv (c); \
> > > > + char *__val = __sync_val_compare_and_swap (&__p, (char *) &__p, __new); \
> > >
> > > This needs acquire-release memory order. Seq-cst would be too much,
> > > anything less than acquire-release won't work on archs with weak memory
> > > models.
> > >
> > No need to complicate matters with custom order as this path will likely be
> > visited at most once per program lifetime.
>
> First, whether something is "unlikely" to go wrong isn't a good
> criterion for glibc code.
This is not about going wrong but about separating a hot code and cold.
Hot code should be optimized for speed but cold code for readability.
>From correctness perspecive this always returns correct value so only
performance is a concern. You evaluate this by risk analysis, in a worst
case we always recompute a getenv from scratch as unoptimized code
always computed getenv there is only additional overhead of atomic
instructions,
To estimate how many times we recompute there is risk that a thread will
be interupted when computing a function by other thread that computes a
function which we can bound by number of threads.
For synchronizing loads a number of different values can be bounded by
number of cores.
Compared to pthread_once it faces same issues but does locking instead.
If a function in question is reentrant and faster that context switch
overhead then using this primitive instead pthread_once is faster.
Also using pthread_once makes function in question not async-signal-safe
while we preserve async-signal safety.
> Second, synchronization/concurrency can indeed be complex and possibly a
> complicated matter depending on your level of experience with it. But
> that's just more reasons to carefully pay attention to the details, and
> document synchronizing code properly.
>
You need attention to details on what is performance impact of
concurency changes, so far your suggestion would only decrease
performance.
> > > You really should also document synchronization code thoroughly. That
> > > we need a CAS for exactly-once might be obvious to many; but I guess
> > > it's not so obvious which memory orders we need (otherwise, I guess you
> > > wouldn't have missed them :). Therefore, do document which accesses are
> > > supposed to synchronize with each other, and which acquire/release pairs
> > > you need to establish the happens-before relations required for
> > > correctness.
> > >
> > > > + __new = (__val == (char *) &__p) ? __new : __val; \
> > > > + } \
> > > > + __new; \
> > > > +})
> > > > +
> > >
> >
>
>