This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH 4/4] Use C11 atomics in pthread_once.
- From: Will Newton <will dot newton at linaro dot org>
- To: Torvald Riegel <triegel at redhat dot com>
- Cc: GLIBC Devel <libc-alpha at sourceware dot org>
- Date: Tue, 4 Nov 2014 15:18:31 +0000
- Subject: Re: [PATCH 4/4] Use C11 atomics in pthread_once.
- Authentication-results: sourceware.org; auth=none
- References: <1414617613 dot 10085 dot 23 dot camel at triegel dot csb> <1414620650 dot 10085 dot 63 dot camel at triegel dot csb>
On 29 October 2014 22:10, Torvald Riegel <triegel@redhat.com> wrote:
Hi Torvald,
> This patch transforms pthread_once to use C11 atomics. It's meant as an
> illustration and early test.
>
> Please note that I've transformed *all* accesses to concurrently
> accessed memory locations to use atomic operations. This is the right
> thing to do to inform the compiler about concurrency and prevent it from
> making optimizations based on assumptions about data-race-freedom and
> sequential code (concurrent accesses are not sequential code...).
> You'll see that atomic_*_relaxed is used quite a bit, which restricts
> the compiler a little but does not add any barriers.
>
> Also, this makes it easy to see which loads and stores are actually
> concurrent code and thus need additional attention by the programmer.
>
> I've compared generated code on x86_64 on GCC 4.4.7. The only thing
> that changes between before/after the patch is that a "cmp %eax,%edx"
> becomes a "cmp %edx,%eax", but it's used to test equality of the values.
>
> I've also looked at the code generated by a pre-4.9 GCC build. The code
> generated for the pthread_once fast path is the same as with GCC 4.4.7
> before the patch. The slow path has somewhat different code with the
> more recent compiler, with less instructions.
I tried gcc 4.8.3 on x86_64 and the code is slightly longer in the new
version (including pushing a larger frame) but the performance
measured by the pthread_once benchtest is identical. gcc 4.8.2 on ARM
the code is identical apart from some register allocation decisions.
It would be interesting to know what the compiler is doing here but I
guess if the performance is the same it may not really matter.
--
Will Newton
Toolchain Working Group, Linaro