This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCHv3] PowerPC: Fix a race condition when eliding a lock



On 01-09-2015 16:38, Peter Bergner wrote:
> On Mon, 2015-08-24 at 15:11 -0300, Tulio Magno Quites Machado Filho wrote:
>> +/* Returns 0 if the lock defined by is_lock_free was elided.
>> +   ADAPT_COUNT is a per-lock state variable.  */
>> +# define ELIDE_LOCK(adapt_count, is_lock_free)				\
>> +  ({									\
>> +    int ret = 0;							\
>> +    if (adapt_count > 0)						\
>> +      (adapt_count)--;							\
>> +    else								\
>> +      for (int i = __elision_aconf.try_tbegin; i > 0; i--)		\
>> +	{								\
>> +	  if (__builtin_tbegin (0))					\
>> +	    {								\
>> +	      if (is_lock_free)						\
>> +		{							\
>> +		  ret = 1;						\
>> +		  break;						\
>> +		}							\
>> +	      __builtin_tabort (_ABORT_LOCK_BUSY);			\
>> +	    }								\
>> +	  else								\
>> +	    if (!__get_new_count(&adapt_count))				\
>> +	      break;							\
>> +	}								\
>> +    ret;								\
>> +  })
> 
> Interesting.  I was going to suggest following your __builtin_tabort()
> call here with a call to __builtin_unreachable() so we can give the
> compiler a little more freedom to optimize things on that path.
> Then I remembered we had some conditionl tabort insns we might
> be able to use here, since the tabort above is conditional on
> is_lock_free == 0.
> 
> The following code seems to generate much better code, but at the
> loss of not passing _ABORT_LOCK_BUSY as a failure cause.
> 
> /* Returns 0 if the lock defined by is_lock_free was elided.
>    ADAPT_COUNT is a per-lock state variable.  */
> # define ELIDE_LOCK(adapt_count, is_lock_free)                         \
>   ({                                                                   \
>     int ret = 0;                                                       \
>     if (adapt_count > 0)                                               \
>       (adapt_count)--;                                                 \
>     else                                                               \
>       for (int i = __elision_aconf.try_tbegin; i > 0; i--)             \
>        {                                                               \
>          if (__builtin_tbegin (0))                                     \
>            {                                                           \
>              __builtin_tabortwci (0x4, is_lock_free, 0)                \
>              ret = 1;                                                  \
>              break;                                                    \
>            }                                                           \
>          else                                                          \
>            if (!__get_new_count(&adapt_count))                         \
>              break;                                                    \
>        }                                                               \
>     ret;                                                               \
>   })
> 
> 
> However, if I look at the code that consumes the failure code, I see
> that we're only looking for the persistent bit being set and not any
> specific failure cause we passed in:

The idea of passing a error code in transaction abort is to differentiate
it from other possible abort types, for instance newer kernels that
might abort in syscalls or a user program that defines it is own
transaction abort code.

> 
> static inline bool
> __get_new_count (uint8_t *adapt_count)
> {
>   /* A persistent failure indicates that a retry will probably
>      result in another failure.  Use normal locking now and
>      for the next couple of calls.  */
>   if (_TEXASRU_FAILURE_PERSISTENT (__builtin_get_texasru ()))
>     {
>       if (__elision_aconf.skip_lock_internal_abort > 0)
>        *adapt_count = __elision_aconf.skip_lock_internal_abort;
>        return false;
>     }
> 
>   /* Same logic as above, but for a number of temporary failures in a
>      a row.  */
>   else if (__elision_aconf.skip_lock_out_of_tbegin_retries > 0
>           && __elision_aconf.try_tbegin > 0)
>     *adapt_count = __elision_aconf.skip_lock_out_of_tbegin_retries;
>   return true;
> }
> 
> ...although looking closer, I see have the definitions we use for
> passing in failure codes is:
> 
> /* Definitions used for TEXASR Failure code (bits 0:6), they need to be even
>    because tabort. always sets the first bit.  */
> #define _ABORT_LOCK_BUSY       0x3f   /* Lock already used.  */
> #define _ABORT_NESTED_TRYLOCK  0x3e   /* Write operation in trylock.  */
> #define _ABORT_SYSCALL         0x3d   /* Syscall issued.  */
> 
> I'm not sure I understand the "even" part above, since two of these
> values are clearly not even.  In fact, the two odd values above if
> passed to __builtin_tabort() will lead to the persistent bit being
> set, since tabort. sets the upper part of the TEXASR by taking the
> least significant 8 bits of the value you passed in and concatenating
> 0x000001 to the end of it, so essentially (using _ABORT_LOCK_BUSY as
> a failure cause example):
> 
>    TEXASR(0:31) = ((_ABORT_LOCK_BUSY & 0xff) << 24) | 0x1;
> 
> Given the persistent bit is bit 7, that means any odd value we pass
> into __builtin_tabort(XXX) will set the TEXASR persistent bit.
> Did we mean to do that?  Maybe we did, I'm just asking. :-)
> I can see that if we're about to do a syscall, that should be persistent.
> Are we also treating someone having a lock as persistent too?

Indeed the 'odd' comment does not make sense and we should just remove it
(I misread texasr definition).  My initial idea was define some codes
that set the persistent failures and some that do not.  I think I best
approach would be:

/* tabort will set TEXASR(0:31) = ((_ABORT_LOCK_BUSY & 0xff) << 24) | 0x1
   and the TEXASR persistent bit is bit 25 (32-7).  Only the syscall
   code means a persistent error that should trigger a default lock
   acquisition.  */
#define _ABORT_SYSCALL         0x1   /* Syscall issued.  */
#define _ABORT_LOCK_BUSY       0x2   /* Lock already used.  */
#define _ABORT_NESTED_TRYLOCK  0x4   /* Write operation in trylock.  */

> 
> If we did, then using __builtin_tabortwci() won't work, since we
> wouldn't be setting the persistent bit like we are now... unless
> we can pass in is_lock_free to __get_new_count and test that
> along with the persistent bit still.

I think we can use '__builtin_tabortwci' since the transactional aborts
here are not meant to be persistent (the syscall one are done in syscall
wrappers).

What do you think?

> 
> Peter
> 
> 
> 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]