This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 07/10] Add __pthread_set_abort_hook export


On Fri, 2013-01-25 at 19:41 +0100, Andi Kleen wrote:
> On Fri, Jan 25, 2013 at 06:45:18PM +0100, Torvald Riegel wrote:
> > > > (2) For explicitly transactional code (ie, code in which some programmer
> > > > explicitly used TSX), you want a facility to communicate some
> > > > information out of transactions without having to finish execution of
> > > > these transactions.
> > > 
> > > I want it for all transactional code, both implicit and explicit.
> > 
> > In that case, wouldn't it be better to have something that is robust
> > enough so that we can put it into assert()?  Otherwise, if you have
> > existing implicitly transactional code, you couldn't use the assertions
> > it might contain, and would have to rewrite them to use TXN_ASSERT
> > instead.
> 
> Hmm yes we could add this to __assert_fail
> 
> if (cpu_has_rtm) { 
> 	while (_xtest())
> 		_xend();
> }
> 
> It won't work for HLE though, but that's no different from the abort
> hook. The _xend() would abort in HLE, so you won't get an assert, but 
> at least the program won't crash and just reexecute.

Right, and this is what I would *like* to do.

However, the TSX specification states that _xend inside an HLE region
can raise a general protection fault.  In the terms used in the spec,
xtest returns RTM_ACTIVE || HLE_ACTIVE, and _xend signals the fault
iff !RTM_ACTIVE.  This of course doesn't mean that hardware that aborts
instead of signaling the fault -- like Haswell, as you said -- is
incorrect (there can always be spurious aborts...), but it means that
other implementations of this specification could signal the fault, in
which case putting the above into __assert_fail would be a bug.

Which brings me back to my robustness remark.  I don't think we want to
risk crashes when running on CPUs other than Haswell in the future.

If the abort is intended behavior, could we get some update to the spec
or something else that *officially* clarifies this point?

Or could we just enable the early commit on Haswell for now?  Then if we
might get other hardware that signals the fault, this wouldn't lead to
crashes but just assertions to abort when executed transactionally.

> Sounds ok for me. With that I would be ok with dropping the hook.
> 
> There was one more use case with debugger break points on abort, but those
> could be likely solved differently too.

Discussing this early with the gdb folks could be helpful, if you
haven't done this yet.

> > > > in, or does TSX not complain about replacing xrelease with an RTM
> > > > commit?
> > > 
> > > RTM inside HLE aborts.
> > 
> > And as you said, HLE inside RTM txns aborts too, which means that
> > whenever we could get something out with an abort, we could also commit
> > the txn early (ie, with the simple loop I suggested).  Or not?
> 
> To commit HLE you need to know the lock address, lock size and lock value.
> I don't see any generic way to get that in a assert.
> 
> Maybe I misunderstood your questions?

I was just clarifying why I thought the "early commit" should work on
Haswell, and asking whether this would indeed be the case.

> However at least the glibc doesn't use HLE, so for this pthread
> implementation it's academic.

But assertions can be contained in user code that might use HLE, so it's
of practical relevance anyway.

> > > > 
> > > > If TSX complains, we get a fault, IIRC, so when this fault happened
> > > > within the code with the loop above, we'd still know that some assertion
> > > > fired.  If we inline this code, or add other hints regarding what called
> > > > it, I guess we could find out which assertion triggered the fault by
> > > > looking at the code around where the fault happened?  Thoughts?
> > > 
> > > Inline the only way to know the code is to use XABORT and encode 
> > > it in the abort code.
> > 
> > Do you mean to that the fault will not reveal the real address but just
> > the xbegin instruction's addr?  Forgot about this one...
> 
> I'm not sure what you mean with "fault". abort? 

The general protection fault raised according to the TSX spec.

> All exceptions inside the transaction lead to an abort.

That's allowed according to the spec, but not required.

> The program doesn't know the abort address normally, the only way to get
> addresses is to use perf to look at the PEBS record and/or the LBRs.
> In theory you can do self monitoring with perf, but I suspect in most
> cases it will be only used with external profiling tools.

Interesting.  But I suppose this isn't guaranteed to work in the sense
that you could ensure that you get the additional abort information such
as the addresses (e.g., because there might be more stuff being
monitored, so no space for the abort data)?


Torvald



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]