This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Lock Elision / Transaction ABI discussion

From: Dominik Vogt <vogt at linux dot vnet dot ibm dot com>
To: libc-alpha at sourceware dot org
Cc: triegel at redhat dot com, andi at firstfloor dot org
Date: Mon, 16 Sep 2013 15:35:54 +0200
Subject: Lock Elision / Transaction ABI discussion
Authentication-results: sourceware.org; auth=none
Reply-to: libc-alpha at sourceware dot org
The ABI discussion regarding lock elision and abort codes is
across a number of separate threads.  I've tried to compile these
messages into this one to simplify further discussion.

On Wed, Sep 11, 2013 at 06:11:43PM +0200, Torvald Riegel wrote:
> On Mon, 2013-09-02 at 10:12 +0200, Dominik Vogt wrote:
> > Using explicit abort code definitions is problematic because they are thrown at
> > the creator of the outermost transaction who may misinterpret the user
> > specified abort code because he does not know where it comes from.
> >
> > The lock elision implementation should refrain from using explicit abort code
> > definitions and rather only interpret the flags of the abort code (as should
> > other pieces of software that have abort handlers).
>
> I think that's still under discussion.  Intel seems to be fine with it,
> and as I wrote in my other email in detail, I'd like to first see a more
> detailed discussion on this, starting with you showing why it needs to
> be this way.  We may very well end up with different policies for x86
> and z, but before we split this we should know exactly why we need to.
> And perhaps your case will show up issues for x86 too.


On Wed, Sep 11, 2013 at 06:14:00PM +0200, Torvald Riegel wrote:
> On Tue, 2013-09-03 at 10:09 +0200, Dominik Vogt wrote:
> > Yes, but that is not the whole story.  The abort codes are also
> > used for program flow control (at least the ..._BUSY code) in my
> > eyes this is a no go because there's no guarantee that that code
> > was really generated in glibc.  There could be a nested
> > transaction in a third party library that aborts one of its own
> > transactions with that code, and glibc, if it opened the outermost
> > transaction, would misinterpret this third party abort code as one
> > that was generated by itself and change current and future program
> > flow because of that.
>
> Which would be just a performance problem for glibc, though, as I argued
> in a reply to your other email.
>
> > I understand that profiling is an important issue.  In my eyes it
> > is allright to _set_ your own abort codes for debugging purposes,
> > but not to determine control flow depending on abort codes.
>
> To me, the critical distinction is whether misinterpretation of an abort
> code can affect correctness, or can significantly mess up performance
> (ie, more than what we currently can have anyway due to the very simple
> adaptation mechanism).


On Wed, Sep 11, 2013 at 05:55:23PM +0200, Torvald Riegel wrote:
> I'd first like to hear why you think that the misinterpretation problem
> is such a big problem, and that there cannot be an ABI for the codes.
> Second, if there cannot be an ABI, why you think that the
> misinterpretation would be so costly in terms of messing up the
> adaptation that it must definitely be avoided.


On Tue, Sep 03, 2013 at 04:00:36PM -0700, Andi Kleen wrote:
> Dominik Vogt <vogt@linux.vnet.ibm.com> writes:
> > I completely understand that the removed abort is a very useful
> > tool for testing and analyses for Tsx, and if you have an idea how
> > to fix the abort code misinterpretation problem without "breaking"
> > profiling I'd like to hear it (because this discussion is also
> > relevant for the z port).
>
> We reserved some fixed values for abort codes in the optimization guide
> (you can see that as a kind of ABI)
>
> 0xff lock busy.
> 0xfe lock_is_locked (not used in pthread)
> 0xfd nested trylock
> 0xf0...0xfc reserved
>
> This also again has the advantage that it can be profiled.
>
> I've been recently considered reserving another code for "abort one
> time, do not adapt", there are some use cases for this in glibc
> (e.g. the dynamic linker which always aborts once)


On Wed, Sep 11, 2013 at 04:04:17PM -0700, Andi Kleen wrote:
> Torvald Riegel <triegel@redhat.com> writes:
> > Is this an official "Intel-blessed" document?
>
> It's Intel authored.
>
> > Can you post the link?
>
> http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf
> 12.4.5
>
> > Who does this apply to?
>
> Everyone who wants to follow it. At least all my code and pretty much
> all of the examples use this convention.


On Wed, Sep 11, 2013 at 02:03:22PM +0200, Torvald Riegel wrote:
> On Fri, 2013-08-23 at 10:49 +0200, Dominik Vogt wrote:
> > Summary
> > -------
> >
> > Different pieces of code that use transactions that are logically
> > independent share a single (nested) transaction in the cpu.  A
> > user (e.g. a library) of transactions must not assume that
> >
> >  * the outermost transactions is always created by the user,
> >  * the innermost transaction has been created by the user,
> >  * the abort code caught by its abort handler has been set by the
> >    user,
> >  * the abort code is passed to the user's abort handler, if the
> >    user aborts a transaction,
> >  * aborts are ever handled by the user's own abort handler.
> >
> > Failing to do so might break Posix semantics of the elision
> > code(!)
>
> Right.  However, these rules should be clear for people familiar with
> the semantics of current HTMs.  Do you think that it would be helpful if
> we'd add them to the glibc HLE guidelines wiki page, for example?
>
> > Rules for transactional coding (draft)
> > --------------------------------------
> >
> > The following rules help to reduce potential problems if each
> > piece of software that uses transactions sticks to the rules:
> >
> > A) Abort handlers should not interpret abort codes beyond what is
> >    documented in the cpu specification (i.e. on Intel, it should
> >    ignore the user defined bits 31:24 of the abort status; on z,
> >    it should ignore the abort code completely and look only at the
> >    condition code).
>
> I think that's too strict.  You can interpret them, but only in
> situations where it's fine to misinterpret (e.g., if this can have a
> limited effect on performance, it should be fine to interpret; in
> contrast, you must not interpret if this might change semantics).
>
> > Abort codes are thus only useful for
> >    debugging and not as a means of controlling program flow.
>
> I'd just explain the limitations.
>
> >    If it is necessary to use abort codes to pass information from
> >    the transaction to the user, the abort codes _must_ be globally
> >    unique, at least if used in libraries.  It might be necessary
> >    to register them through Icann or so.
>
> I don't think ICANN is the right place.  These are arch-specific, and
> thus should be part of the the respective arch's+platform's ABI.
> Nonetheless, this is missing currently.
>
> > B) Because of (A), user defined abort codes should not be used to
> >    control program flow.  If they are, it is the responsibility of
> >    the programmer to make sure that all software components that
> >    deal with transactions agree on the interpretation of the abort
> >    codes and can deal with codes set by third party software.
> >
> > C) If a transaction body calls any functions, it cannot be assumed
> >    that a transaction is still open when an XEND or XABORT
> >    instruction is reached.  It is necessary to check whether a
> >    transaction is still open and skip the XABORT or XEND if not.
>
> I disagree strongly here.  You must never commit a transaction that you
> haven't started.  Doing so will break atomicity assumptions, at the very
> least.
>
> > D) Explicitly aborting transactions should be avoided except for
> >    debugging purposes.
>
> I don't see a reason for such a guideline.  The aborts have their
> limitations, but
>
> > E) The innermost transactions should only be closed (with XEND,
> >    TEND etc.) if it was created by the same user.
>
> That kind of follows once C) is removed / inverted.
>
> > F) Make sure that control of program flow works as expected even
> >    if your abort handler is never called when transactions abort
> >    (because it's not the outermost transaction).
>
> I'd move that into the abort vs. flat nesting discussion.
>
> > Applying the rules to the current elision code
> > ----------------------------------------------
> >
> > (A) and (B) are easy to implement.  Instead of aborting with
> > _ABORT_LOCK_BUSY we can use XEND here and handle the (logical)
> > transaction failure directly.
>
> I think the aborts are fine.  XEND could be better if the aborts where
> more costly in terms of performance than spinning and running into a
> normal conflict with another transactions.  But we don't have data for
> this, so I'd keep it as is as long as we have proof/indication to the
> contrary.
>
> Also, Andi's argument re better visibility of aborts in profilers
> applies here.  Perhaps some performance counter or such could reveal the
> location of the conflict, and the profiler could figure out that there's
> a phtread mutex on the same cacheline, but that sounds rather
> complicated.
>
> > (C) can be implemented with "if (_xtest()) ..." where necessary.
>
> See above.
>
> > (D) can be implemented for lock(), but trylock() currently depends
> > on the external abort because of Posix requirements.
>
> This shouldn't need action either IMO.
>
> > (E) would require to write down the current nesting depth each
> > time a transaction is started and only use XEND, TEND, etc. if the
> > nesting depth is still the same.  It's unclear though how to
> > behave if the nesting depth has changed, though.  Furthermore,
> > while it is possible to query the current nesting depth on z, I
> > think there is no way to do that on Intel.
>
> In the HLE implementation, we already take care of this implicitly.  We
> don't know the exact nesting depth, but in a correct program, we'll
> always pair an elided lock acquisition with an elided lock release.  So
> no need for action either.
>
> Keeping track of the nesting depth would increase the HTM capacity
> footprint of elided critical sections by at least one cacheline (you
> need to do it in thread-local state to avoid conflicts), which is
> something I'd like to avoid.
>
> > I have no solution for (F) yet; if pthread mutexes are only used
> > from inside third party transactions, the adapt_count would never
> > be modified in the abort path, because the abort path is never
> > executed.  This completely breaks the adaption logic.
>
> The robustness of the adaptation is indeed a problem.  In the worst case
> a forward progress problem (ie, correctness).  A strict ABI for the
> semantics of certain abort codes could be a solution, or perhaps we can
> fake this effectively.  I'll think more about it.
>
> Could you update this draft along with our discussion about it?  IMO, it
> would make sense to add this to the HLE guidelines, I believe, so that
> at least glibc's understanding of / assumptions about how to use
> transactions is documented.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
Follow-Ups:
- Re: Lock Elision / Transaction ABI discussion
  - From: Dominik Vogt
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]