This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCHv3] PowerPC: Fix a race condition when eliding a lock


On Mon, 2015-08-24 at 15:11 -0300, Tulio Magno Quites Machado Filho
wrote:
> Changes since v2:
>  - Moved part of the source code comments to the commit message.
>  - Added O'Donnel's suggestion to the source code comments.
> 
> Changes since v1:
>  - Improved commit message.
>  - Added new comments in the source code alerting to the concurrency details
>    intrinsic to this code.
>  - Removed compiler barriers from this patch, which will be treated in another
>    patch and will be synchronized with the GCC implementation.
> 
> 8<----------
> 
> The previous code used to evaluate the preprocessor token is_lock_free to
> a variable before starting a transaction.  This behavior can cause an
> error if another thread got the lock (without using a transaction)
> between the evaluation of the token and the beginning of the transaction.

I find the use of the preprocessor to capture an evaluation really ugly.
And it is error-prone, as the bug fixed by this patch shows.  I would
appreciate a follow-up patch that replaces is_lock_free with either a
function that is called (relying on the compiler to inline it if the
function pointer is constant, which it should be), or an address to a
value that is compared to some caller-provided constant (which would
limit flexibility for the condition somewhat, but is cleaner).

> This bug can be triggered with the following order of events:
> 1. The lock accessed by is_lock_free is free.
> 2. Thread T1 evaluates is_lock_free and stores into register R1 that the
>    lock is free.
> 3. Thread T2 acquires the same lock used in is_lock_free.
> 4. T1 begins the transaction, creating a memory barrier where is_lock_free
>    is false, but R1 is true.
> 5. T1 reads R1 and doesn't abort the transaction.
> 6. T1 calls ELIDE_UNLOCK, which reads false from is_lock_free and decides
>    to unlock a lock acquired by T2, leading to undefined behavior.
> 
> This patch delays the evaluation of is_lock_free to inside a transaction
> by moving this part of the code to the macro ELIDE_LOCK.
> 
> 2015-08-24  Tulio Magno Quites Machado Filho  <tuliom@linux.vnet.ibm.com>
> 
> 	[BZ #18743]
> 	* sysdeps/powerpc/nptl/elide.h (__elide_lock): Move most of this
> 	code to...
> 	(ELIDE_LOCK): ...here.
> 	(__get_new_count): New function with part of the code from
> 	__elide_lock that updates the value of adapt_count after a
> 	transaction abort.
> 	(__elided_trylock): Moved this code to...
> 	(ELIDE_TRYLOCK): ...here.
> ---
>  NEWS                         |   3 +-
>  sysdeps/powerpc/nptl/elide.h | 113 +++++++++++++++++++++++--------------------
>  2 files changed, 63 insertions(+), 53 deletions(-)
> 
> diff --git a/NEWS b/NEWS
> index 6b36c08..76df3fd 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -11,7 +11,8 @@ Version 2.23
>  
>    14341, 16517, 16519, 16520, 16734, 16973, 17787, 17905, 18084, 18086,
>    18265, 18370, 18421, 18480, 18525, 18618, 18647, 18661, 18674, 18681,
> -  18778, 18781, 18787, 18789, 18790, 18795, 18796, 18820, 18823, 18824.
> +  18778, 18781, 18787, 18789, 18790, 18795, 18796, 18820, 18823, 18824,
> +  18743.
>  
>  * The obsolete header <regexp.h> has been removed.  Programs that require
>    this header must be updated to use <regex.h> instead.
> diff --git a/sysdeps/powerpc/nptl/elide.h b/sysdeps/powerpc/nptl/elide.h
> index 389f5a5..1cc9890 100644
> --- a/sysdeps/powerpc/nptl/elide.h
> +++ b/sysdeps/powerpc/nptl/elide.h
> @@ -23,67 +23,76 @@
>  # include <htm.h>
>  # include <elision-conf.h>
>  
> -/* Returns true if the lock defined by is_lock_free as elided.
> -   ADAPT_COUNT is a pointer to per-lock state variable. */
> -
> +/* Get the new value of adapt_count according to the elision
> +   configurations.  Returns true if the system should retry again or false
> +   otherwise.  */
>  static inline bool
> -__elide_lock (uint8_t *adapt_count, int is_lock_free)
> +__get_new_count (uint8_t *adapt_count)
>  {
> -  if (*adapt_count > 0)
> +  /* A persistent failure indicates that a retry will probably
> +     result in another failure.  Use normal locking now and
> +     for the next couple of calls.  */
> +  if (_TEXASRU_FAILURE_PERSISTENT (__builtin_get_texasru ()))
>      {
> -      (*adapt_count)--;
> +      if (__elision_aconf.skip_lock_internal_abort > 0)
> +	*adapt_count = __elision_aconf.skip_lock_internal_abort;
>        return false;
>      }
> -
> -  for (int i = __elision_aconf.try_tbegin; i > 0; i--)
> -    {
> -      if (__builtin_tbegin (0))
> -	{
> -	  if (is_lock_free)
> -	    return true;
> -	  /* Lock was busy.  */
> -	  __builtin_tabort (_ABORT_LOCK_BUSY);
> -	}
> -      else
> -	{
> -	  /* A persistent failure indicates that a retry will probably
> -	     result in another failure.  Use normal locking now and
> -	     for the next couple of calls.  */
> -	  if (_TEXASRU_FAILURE_PERSISTENT (__builtin_get_texasru ()))
> -	    {
> -	      if (__elision_aconf.skip_lock_internal_abort > 0)
> -		*adapt_count = __elision_aconf.skip_lock_internal_abort;
> -	      break;
> -	    }
> -	  /* Same logic as above, but for a number of temporary failures in a
> -	     a row.  */
> -	  else if (__elision_aconf.skip_lock_out_of_tbegin_retries > 0
> -		   && __elision_aconf.try_tbegin > 0)
> -	    *adapt_count = __elision_aconf.skip_lock_out_of_tbegin_retries;
> -	}
> -     }
> -
> -  return false;
> +  /* Same logic as above, but for a number of temporary failures in a
> +     a row.  */
> +  else if (__elision_aconf.skip_lock_out_of_tbegin_retries > 0
> +	   && __elision_aconf.try_tbegin > 0)
> +    *adapt_count = __elision_aconf.skip_lock_out_of_tbegin_retries;
> +  return true;
>  }
>  
> -# define ELIDE_LOCK(adapt_count, is_lock_free) \
> -  __elide_lock (&(adapt_count), is_lock_free)
> -
> -
> -static inline bool
> -__elide_trylock (uint8_t *adapt_count, int is_lock_free, int write)
> -{
> -  if (__elision_aconf.try_tbegin > 0)
> -    {
> -      if (write)
> -	__builtin_tabort (_ABORT_NESTED_TRYLOCK);
> -      return __elide_lock (adapt_count, is_lock_free);
> -    }
> -  return false;
> -}
> +/* CONCURRENCY NOTES:
> +
> +   The value of is_lock_free is read and possibly written to by multiple

As you write above, is_lock_free is not a value, it is an expression.
The expression will read from a memory location, that is then modified
by other threads.  Please clarify that so this is clear to readers (this
misunderstanding seems to be related to the bug you're fixing...).

> +   threads, either during lock acquisition, or elision.  In order to avoid
> +   this evaluation from becoming a data race the access of is_lock_free

It could be a data race because you're not using atomics there, but
that's not the whole point.  (We use the term "data race" specifically
to refer to the C11/C++11 memory model concept of the same name.)
You want to ensure the lock elision synchronization scheme, and thus are
moving it inside the txn.

> +   is placed *inside* the transaction.  Within the transaction is_lock_free
> +   can be accessed with relaxed ordering.  We are assured by the transaction

Please use "memory order" to be specific, or abbreviate with MO as
mentioned on the Concurrency wiki page.

> +   that the value of is_lock_free is consistent with the state of the lock,
> +   otherwise the hardware will abort the transaction.  */

That's not the really why this lock elisions synchronization scheme
works :)  (At least you're not giving sufficient reason why it is.)

Txn/txn synchronization is relatively simple, basically txns that access
the same memory locations are atomic and ordered.
Txnal/nontxnal synchronization is more complex.  I don't remember how
IBM has specified txn semantics in detail, but the way it's basically
supposed to work is:
* If the txn reads-from data written in a critical section, then it will
also observe the lock acquisition of this critical section or a later
write to the same lock memory locations by this critical section (e.g.,
a lock release).
* If the txn reads-from a lock release, it will observe all prior writes
in the respective critical section.
* If a txn reads-from a memory location, it picks a write to read-from.
It won't change this choice if it reads-from the same location again.
If it can't read-from the same write, it aborts.
(Here, "observe" means that it reads-from those locations, then the
respective writes are visible side effects.)
The result of this (rough) set of rules is that in all executions in
which it doesn't abort, it will not read intermediate state produced by
other critical sections, but is ordered wrt to them and their writes to
application data.

It's hard to describe that in our current terminology because txns
aren't covered in the C11 memory model.

Also note that this implementation would not be correct:
  if (tbegin(0))
    {
       <application code>
       if (!is_lock_free) tabort();
       tend();
    }
This is because the application code could call tend on some path, for
example when reading an intermediate state of a concurrent non-elided
critical section.
I'm mentioning this because this shows that is_lock_free must be
"evaluated" early, and that the compiler must not reorder it to the end
of the critical section.  We could use atomic MO on the load in
is_lock_free for that, but then we get the unnecessary HW acquire fence.
The reason why it should work without that is that the compiler
shouldn't execute side effects that the abstract machine wouldn't do if
is_lock_free evaluated false, and application code that may call tend()
is such a side effect.  Thus, you can't in practice avoid the
is_lock_free check, and once you do it the rules above take care of
correctness.

Intel TSX and IBM's HTMs (IIRC) don't let any side effects, such as page
faults, escape from an aborted txn.  In contrast, AMD's ASF proposal did
let page faults escape, in which case the acquire MO would be required.

I hope this clarifies why I'm saying that mixed txnal/nontxnal
synchronization isn't all that simple.

> +/* Returns 0 if the lock defined by is_lock_free was elided.
> +   ADAPT_COUNT is a per-lock state variable.  */
> +# define ELIDE_LOCK(adapt_count, is_lock_free)				\
> +  ({									\
> +    int ret = 0;							\
> +    if (adapt_count > 0)						\
> +      (adapt_count)--;							\
> +    else								\
> +      for (int i = __elision_aconf.try_tbegin; i > 0; i--)		\
> +	{								\
> +	  if (__builtin_tbegin (0))					\
> +	    {								\
> +	      if (is_lock_free)						\

This evaluation should use relaxed MO atomic accesses, IMO.  But that
can be changed by a follow-up patch.

> +		{							\
> +		  ret = 1;						\
> +		  break;						\
> +		}							\
> +	      __builtin_tabort (_ABORT_LOCK_BUSY);			\
> +	    }								\
> +	  else								\
> +	    if (!__get_new_count(&adapt_count))				\
> +	      break;							\
> +	}								\
> +    ret;								\
> +  })
>  
>  # define ELIDE_TRYLOCK(adapt_count, is_lock_free, write)	\
> -  __elide_trylock (&(adapt_count), is_lock_free, write)
> +  ({								\
> +    int ret = 0;						\
> +    if (__elision_aconf.try_tbegin > 0)				\
> +      {								\

I'd prefer a comment here why you abort if write is true.  Or a
reference to another source file / function where this is explained.
Can be in a follow-up patch.

> +	if (write)						\
> +	  __builtin_tabort (_ABORT_NESTED_TRYLOCK);		\
> +	ret = ELIDE_LOCK (adapt_count, is_lock_free);		\
> +      }								\
> +    ret;							\
> +  })
>  
> 
>  static inline bool




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]