This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Fix dwarf2 unwinding through futex functions


On 09/09/2013 07:43 PM, Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
> 
> When profiling programs with lock problems with perf record -g dwarf, libunwind
> can currently not backtrace through the futex and unlock functions in pthread.
> This is because they use out of line sections, and those are not correctly
> described in dwarf2 (I believe needs dwarf3 or 4).
> 
> This patch first removes the out of line sections. They only save a single
> jump, but cause a lot of pain. Then it converts the now inline lock
> code to use the now standard gas .cfi_* commands.
> 
> Then the very complicated and long manual dwarf2 code can be removed. Right
> now I just #if 0ed it to make review easier, but it can be just
> removed for the final commit.
> 
> With these changes libunwind/perf can backtrace through the futex functions
> now.
> 
> Longer term it would be likely better to just use C futex() functions
> on x86 like all the other architectures. This would clean the code up
> even more.

(1) Testsuite.

I will assume the testsuite including all of the unwinding tests passed
with no regressions.

(2) gdb.

Have you tried running the gdb testsuite against the modified glibc?
Often subtle perturbations in glibc can impact gdb.

(3) Review by others.

Roland's comments about performance have been answered by you and you have
tested again and found no difference. I wish we had some better benchmarks
in our framework for this, but that should not block this cleanup.

It has always been my opinion that any micro-optimization like this which
harms the ability to benchmark is a terrible tradeoff. There is so much
code churn in other parts of the library that if we don't "design for test"
then we will be unable to evolve the library fast enough. That is to say 
that to evolve the library quickly I expect to need automated tools to do 
performance regression testing and profiling is such a tool.

Andreas Schwab has stated that he would like to see the code removed.

Please post a final version and we'll sign off on that.

> 2013-09-09  Andi Kleen  <ak@linux.intel.com>
> 
> 	* sysdeps/unix/sysv/linux/x86_64/lowlevellock.h (lll_lock, lll_robust_lock,
>           lll_cond_lock, lll_timedlock, lll_robust_cond_lock, lll_robust_timedlock,
> 	  lll_unlock, lll_robust_unlock): Remove out of line section. Use cfi
>           intrinsics.
>           (LLL_STUB_UNWIND_INFO*): Remove.
> 	* sysdeps/unix/sysv/linux/i386/lowlevellock.h (lll_lock, lll_robust_lock,
>           lll_cond_lock, lll_timedlock, lll_robust_cond_lock, lll_robust_timedlock,
> 	  lll_unlock, lll_robust_unlock): Remove out of line section. Use cfi
> 	  intrinsics.
>           (LLL_STUB_UNWIND_INFO*): Remove.
> ---
>  nptl/sysdeps/unix/sysv/linux/i386/lowlevellock.h   |  94 +++--------------
>  nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.h | 114 ++++++---------------
>  2 files changed, 44 insertions(+), 164 deletions(-)
> 
> diff --git a/nptl/sysdeps/unix/sysv/linux/i386/lowlevellock.h b/nptl/sysdeps/unix/sysv/linux/i386/lowlevellock.h
> index cc3ec5b..9d9216a 100644
> --- a/nptl/sysdeps/unix/sysv/linux/i386/lowlevellock.h
> +++ b/nptl/sysdeps/unix/sysv/linux/i386/lowlevellock.h
> @@ -126,7 +126,7 @@
>  /* Delay in spinlock loop.  */
>  #define BUSY_WAIT_NOP	asm ("rep; nop")
>  
> -
> +#if 0
>  #define LLL_STUB_UNWIND_INFO_START \
>  	".section	.eh_frame,\"a\",@progbits\n"		\
>  "5:\t"	".long	7f-6f	# Length of Common Information Entry\n"	\
> @@ -201,7 +201,7 @@ LLL_STUB_UNWIND_INFO_START					\
>  	".sleb128 3b-0b\n"					\
>  "20:\t"	".byte	0x40 + (2b-0b) # DW_CFA_advance_loc\n\t"	\
>  LLL_STUB_UNWIND_INFO_END
> -
> +#endif

Remove it.

>  
>  #define lll_futex_wait(futex, val, private) \
>    lll_futex_timed_wait (futex, val, NULL, private)
> @@ -298,16 +298,9 @@ LLL_STUB_UNWIND_INFO_END
>      ({ int ignore1, ignore2;						      \
>         if (__builtin_constant_p (private) && (private) == LLL_PRIVATE)	      \
>  	 __asm __volatile (__lll_lock_asm_start				      \
> -			   "jnz _L_lock_%=\n\t"				      \
> -			   ".subsection 1\n\t"				      \
> -			   ".type _L_lock_%=,@function\n"		      \
> -			   "_L_lock_%=:\n"				      \
> +			   "jz 18f\n\t"				      \

OK, if we're done just jump to the end.

>  			   "1:\tleal %2, %%ecx\n"			      \
>  			   "2:\tcall __lll_lock_wait_private\n" 	      \
> -			   "3:\tjmp 18f\n"				      \
> -			   "4:\t.size _L_lock_%=, 4b-1b\n\t"		      \
> -			   ".previous\n"				      \
> -			   LLL_STUB_UNWIND_INFO_3			      \

OK, no need for a jump becuase we're not using out-of-lin sections.

>  			   "18:"					      \
>  			   : "=a" (ignore1), "=c" (ignore2), "=m" (futex)     \
>  			   : "0" (0), "1" (1), "m" (futex),		      \
> @@ -317,17 +310,10 @@ LLL_STUB_UNWIND_INFO_END
>  	 {								      \
>  	   int ignore3;							      \
>  	   __asm __volatile (__lll_lock_asm_start			      \
> -			     "jnz _L_lock_%=\n\t"			      \
> -			     ".subsection 1\n\t"			      \
> -			     ".type _L_lock_%=,@function\n"		      \
> -			     "_L_lock_%=:\n"				      \
> +			     "jz 18f\n\t"			 	      \

OK.

>  			     "1:\tleal %2, %%edx\n"			      \
>  			     "0:\tmovl %8, %%ecx\n"			      \
>  			     "2:\tcall __lll_lock_wait\n"		      \
> -			     "3:\tjmp 18f\n"				      \
> -			     "4:\t.size _L_lock_%=, 4b-1b\n\t"		      \
> -			     ".previous\n"				      \
> -			     LLL_STUB_UNWIND_INFO_4			      \

OK.

>  			     "18:"					      \
>  			     : "=a" (ignore1), "=c" (ignore2),		      \
>  			       "=m" (futex), "=&d" (ignore3) 		      \
> @@ -341,17 +327,10 @@ LLL_STUB_UNWIND_INFO_END
>  #define lll_robust_lock(futex, id, private) \
>    ({ int result, ignore1, ignore2;					      \
>       __asm __volatile (LOCK_INSTR "cmpxchgl %1, %2\n\t"			      \
> -		       "jnz _L_robust_lock_%=\n\t"			      \
> -		       ".subsection 1\n\t"				      \
> -		       ".type _L_robust_lock_%=,@function\n"		      \
> -		       "_L_robust_lock_%=:\n"				      \
> +		       "jz 18f\n\t"					      \

OK.

>  		       "1:\tleal %2, %%edx\n"				      \
>  		       "0:\tmovl %7, %%ecx\n"				      \
>  		       "2:\tcall __lll_robust_lock_wait\n"		      \
> -		       "3:\tjmp 18f\n"					      \
> -		       "4:\t.size _L_robust_lock_%=, 4b-1b\n\t"		      \
> -		       ".previous\n"					      \
> -		       LLL_STUB_UNWIND_INFO_4				      \

OK.

>  		       "18:"						      \
>  		       : "=a" (result), "=c" (ignore1), "=m" (futex),	      \
>  			 "=&d" (ignore2)				      \
> @@ -366,17 +345,10 @@ LLL_STUB_UNWIND_INFO_END
>    (void)								      \
>      ({ int ignore1, ignore2, ignore3;					      \
>         __asm __volatile (LOCK_INSTR "cmpxchgl %1, %2\n\t"		      \
> -			 "jnz _L_cond_lock_%=\n\t"			      \
> -			 ".subsection 1\n\t"				      \
> -			 ".type _L_cond_lock_%=,@function\n"		      \
> -			 "_L_cond_lock_%=:\n"				      \
> +			 "jz 18f\n\t"					      \

OK.

>  			 "1:\tleal %2, %%edx\n"				      \
>  			 "0:\tmovl %7, %%ecx\n"				      \
>  			 "2:\tcall __lll_lock_wait\n"			      \
> -			 "3:\tjmp 18f\n"				      \
> -			 "4:\t.size _L_cond_lock_%=, 4b-1b\n\t"		      \
> -			 ".previous\n"					      \
> -			 LLL_STUB_UNWIND_INFO_4				      \

OK.

>  			 "18:"						      \
>  			 : "=a" (ignore1), "=c" (ignore2), "=m" (futex),      \
>  			   "=&d" (ignore3)				      \
> @@ -388,17 +360,10 @@ LLL_STUB_UNWIND_INFO_END
>  #define lll_robust_cond_lock(futex, id, private) \
>    ({ int result, ignore1, ignore2;					      \
>       __asm __volatile (LOCK_INSTR "cmpxchgl %1, %2\n\t"			      \
> -		       "jnz _L_robust_cond_lock_%=\n\t"			      \
> -		       ".subsection 1\n\t"				      \
> -		       ".type _L_robust_cond_lock_%=,@function\n"	      \
> -		       "_L_robust_cond_lock_%=:\n"			      \
> +		       "jz 18f\n\t"					      \

OK.

>  		       "1:\tleal %2, %%edx\n"				      \
>  		       "0:\tmovl %7, %%ecx\n"				      \
>  		       "2:\tcall __lll_robust_lock_wait\n"		      \
> -		       "3:\tjmp 18f\n"					      \
> -		       "4:\t.size _L_robust_cond_lock_%=, 4b-1b\n\t"	      \
> -		       ".previous\n"					      \
> -		       LLL_STUB_UNWIND_INFO_4				      \

OK.

>  		       "18:"						      \
>  		       : "=a" (result), "=c" (ignore1), "=m" (futex),	      \
>  			 "=&d" (ignore2)				      \
> @@ -411,17 +376,10 @@ LLL_STUB_UNWIND_INFO_END
>  #define lll_timedlock(futex, timeout, private) \
>    ({ int result, ignore1, ignore2, ignore3;				      \
>       __asm __volatile (LOCK_INSTR "cmpxchgl %1, %3\n\t"			      \
> -		       "jnz _L_timedlock_%=\n\t"			      \
> -		       ".subsection 1\n\t"				      \
> -		       ".type _L_timedlock_%=,@function\n"		      \
> -		       "_L_timedlock_%=:\n"				      \
> +		       "jz 18f\n\t"					      \

OK.

>  		       "1:\tleal %3, %%ecx\n"				      \
>  		       "0:\tmovl %8, %%edx\n"				      \
>  		       "2:\tcall __lll_timedlock_wait\n"		      \
> -		       "3:\tjmp 18f\n"					      \
> -		       "4:\t.size _L_timedlock_%=, 4b-1b\n\t"		      \
> -		       ".previous\n"					      \
> -		       LLL_STUB_UNWIND_INFO_4				      \

OK.

>  		       "18:"						      \
>  		       : "=a" (result), "=c" (ignore1), "=&d" (ignore2),      \
>  			 "=m" (futex), "=S" (ignore3)			      \
> @@ -440,17 +398,10 @@ extern int __lll_timedlock_elision (int *futex, short *adapt_count,
>  #define lll_robust_timedlock(futex, timeout, id, private) \
>    ({ int result, ignore1, ignore2, ignore3;				      \
>       __asm __volatile (LOCK_INSTR "cmpxchgl %1, %3\n\t"			      \
> -		       "jnz _L_robust_timedlock_%=\n\t"			      \
> -		       ".subsection 1\n\t"				      \
> -		       ".type _L_robust_timedlock_%=,@function\n"	      \
> -		       "_L_robust_timedlock_%=:\n"			      \
> +		       "jz 18f\n\t"			   		      \

OK.

>  		       "1:\tleal %3, %%ecx\n"				      \
>  		       "0:\tmovl %8, %%edx\n"				      \
>  		       "2:\tcall __lll_robust_timedlock_wait\n"		      \
> -		       "3:\tjmp 18f\n"					      \
> -		       "4:\t.size _L_robust_timedlock_%=, 4b-1b\n\t"	      \
> -		       ".previous\n"					      \
> -		       LLL_STUB_UNWIND_INFO_4				      \

OK.

>  		       "18:"						      \
>  		       : "=a" (result), "=c" (ignore1), "=&d" (ignore2),      \
>  			 "=m" (futex), "=S" (ignore3)			      \
> @@ -473,16 +424,9 @@ extern int __lll_timedlock_elision (int *futex, short *adapt_count,
>      ({ int ignore;							      \
>         if (__builtin_constant_p (private) && (private) == LLL_PRIVATE)	      \
>  	 __asm __volatile (__lll_unlock_asm				      \
> -			   "jne _L_unlock_%=\n\t"			      \
> -			   ".subsection 1\n\t"				      \
> -			   ".type _L_unlock_%=,@function\n"		      \
> -			   "_L_unlock_%=:\n"				      \
> +			   "je 18f\n\t"					      \

OK.

>  			   "1:\tleal %0, %%eax\n"			      \
>  			   "2:\tcall __lll_unlock_wake_private\n"	      \
> -			   "3:\tjmp 18f\n"				      \
> -			   "4:\t.size _L_unlock_%=, 4b-1b\n\t"		      \
> -			   ".previous\n"				      \
> -			   LLL_STUB_UNWIND_INFO_3			      \

OK.

>  			   "18:"					      \
>  			   : "=m" (futex), "=&a" (ignore)		      \
>  			   : "m" (futex), "i" (MULTIPLE_THREADS_OFFSET)	      \
> @@ -491,17 +435,10 @@ extern int __lll_timedlock_elision (int *futex, short *adapt_count,
>  	 {								      \
>  	   int ignore2;							      \
>  	   __asm __volatile (__lll_unlock_asm				      \
> -			     "jne _L_unlock_%=\n\t"			      \
> -			     ".subsection 1\n\t"			      \
> -			     ".type _L_unlock_%=,@function\n"		      \
> -			     "_L_unlock_%=:\n"				      \
> +			     "je 18f\n\t"				      \

OK.

>  			     "1:\tleal %0, %%eax\n"			      \
>  			     "0:\tmovl %5, %%ecx\n"			      \
>  			     "2:\tcall __lll_unlock_wake\n"		      \
> -			     "3:\tjmp 18f\n"				      \
> -			     "4:\t.size _L_unlock_%=, 4b-1b\n\t"	      \
> -			     ".previous\n"				      \
> -			     LLL_STUB_UNWIND_INFO_4			      \

OK.

>  			     "18:"					      \
>  			     : "=m" (futex), "=&a" (ignore), "=&c" (ignore2)  \
>  			     : "i" (MULTIPLE_THREADS_OFFSET), "m" (futex),    \
> @@ -514,17 +451,10 @@ extern int __lll_timedlock_elision (int *futex, short *adapt_count,
>    (void)								      \
>      ({ int ignore, ignore2;						      \
>         __asm __volatile (LOCK_INSTR "andl %3, %0\n\t"			      \
> -			 "jne _L_robust_unlock_%=\n\t"			      \
> -			 ".subsection 1\n\t"				      \
> -			 ".type _L_robust_unlock_%=,@function\n"	      \
> -			 "_L_robust_unlock_%=:\n\t"			      \
> +			 "je 18f\n\t"					      \

OK.

>  			 "1:\tleal %0, %%eax\n"				      \
>  			 "0:\tmovl %5, %%ecx\n"				      \
>  			 "2:\tcall __lll_unlock_wake\n"			      \
> -			 "3:\tjmp 18f\n"				      \
> -			 "4:\t.size _L_robust_unlock_%=, 4b-1b\n\t"	      \
> -			 ".previous\n"					      \
> -			 LLL_STUB_UNWIND_INFO_4				      \

OK.

>  			 "18:"						      \
>  			 : "=m" (futex), "=&a" (ignore), "=&c" (ignore2)      \
>  			 : "i" (FUTEX_WAITERS), "m" (futex),		      \

OK, caught all 10 uses of out-of-line sections for i386.

> diff --git a/nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.h b/nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.h
> index 35fb01a..38dce67 100644
> --- a/nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.h
> +++ b/nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.h
> @@ -106,7 +106,7 @@
>  /* Delay in spinlock loop.  */
>  #define BUSY_WAIT_NOP	  asm ("rep; nop")
>  
> -
> +#if 0
>  #define LLL_STUB_UNWIND_INFO_START \
>  	".section	.eh_frame,\"a\",@progbits\n" 		\
>  "7:\t"	".long	9f-8f	# Length of Common Information Entry\n" \
> @@ -203,7 +203,7 @@ LLL_STUB_UNWIND_INFO_START					\
>  	".sleb128 4b-0b\n" 					\
>  "26:\t"	".byte	0x40 + (2b-0b) # DW_CFA_advance_loc\n\t" 	\
>  LLL_STUB_UNWIND_INFO_END
> -
> +#endif

Remove it.

>  
>  #define lll_futex_wait(futex, val, private) \
>    lll_futex_timed_wait(futex, val, NULL, private)
> @@ -283,7 +283,7 @@ LLL_STUB_UNWIND_INFO_END
>  
>  #if defined NOT_IN_libc || defined UP
>  # define __lll_lock_asm_start LOCK_INSTR "cmpxchgl %4, %2\n\t"		      \
> -			      "jnz 1f\n\t"
> +			      "jz 24f\n\t"

OK.

>  #else
>  # define __lll_lock_asm_start "cmpl $0, __libc_multiple_threads(%%rip)\n\t"   \
>  			      "je 0f\n\t"				      \
> @@ -291,7 +291,7 @@ LLL_STUB_UNWIND_INFO_END
>  			      "jnz 1f\n\t"				      \
>  			      "jmp 24f\n"				      \
>  			      "0:\tcmpxchgl %4, %2\n\t"			      \
> -			      "jnz 1f\n\t"
> +			      "jz 24f\n\t"

OK.

>  #endif
>  
>  #define lll_lock(futex, private) \
> @@ -299,17 +299,12 @@ LLL_STUB_UNWIND_INFO_END
>      ({ int ignore1, ignore2, ignore3;					      \
>         if (__builtin_constant_p (private) && (private) == LLL_PRIVATE)	      \
>  	 __asm __volatile (__lll_lock_asm_start				      \
> -			   ".subsection 1\n\t"				      \
> -			   ".type _L_lock_%=, @function\n"		      \
> -			   "_L_lock_%=:\n"				      \
>  			   "1:\tlea %2, %%" RDI_LP "\n"			      \
>  			   "2:\tsub $128, %%" RSP_LP "\n"		      \

Any reason we can't factor more of this into __lll_lock_asm_start to simplify
the individual asm statements? If we're here deleting code we might as well delete
more and add it to __lll_lock_asm_start where possible?

OK.

> +		 	   ".cfi_adjust_cfa_offset 128\n"	              \
>  			   "3:\tcallq __lll_lock_wait_private\n"	      \
>  			   "4:\tadd $128, %%" RSP_LP "\n"		      \
> -			   "5:\tjmp 24f\n"				      \
> -			   "6:\t.size _L_lock_%=, 6b-1b\n\t"		      \
> -			   ".previous\n"				      \
> -			   LLL_STUB_UNWIND_INFO_5			      \
> +		 	   ".cfi_adjust_cfa_offset -128\n"	              \

OK. Love to see .cfi* usage here :-)

>  			   "24:"					      \
>  			   : "=S" (ignore1), "=&D" (ignore2), "=m" (futex),   \
>  			     "=a" (ignore3)				      \
> @@ -317,17 +312,12 @@ LLL_STUB_UNWIND_INFO_END
>  			   : "cx", "r11", "cc", "memory");		      \
>         else								      \
>  	 __asm __volatile (__lll_lock_asm_start				      \
> -			   ".subsection 1\n\t"				      \
> -			   ".type _L_lock_%=, @function\n"		      \
> -			   "_L_lock_%=:\n"				      \
>  			   "1:\tlea %2, %%" RDI_LP "\n"			      \
>  			   "2:\tsub $128, %%" RSP_LP "\n"		      \
> +		 	   ".cfi_adjust_cfa_offset 128\n"	              \

OK.

>  			   "3:\tcallq __lll_lock_wait\n"		      \
>  			   "4:\tadd $128, %%" RSP_LP "\n"		      \
> -			   "5:\tjmp 24f\n"				      \
> -			   "6:\t.size _L_lock_%=, 6b-1b\n\t"		      \
> -			   ".previous\n"				      \
> -			   LLL_STUB_UNWIND_INFO_5			      \
> +		 	   ".cfi_adjust_cfa_offset -128\n"	              \

OK.

>  			   "24:"					      \
>  			   : "=S" (ignore1), "=D" (ignore2), "=m" (futex),    \
>  			     "=a" (ignore3)				      \
> @@ -338,18 +328,13 @@ LLL_STUB_UNWIND_INFO_END
>  #define lll_robust_lock(futex, id, private) \
>    ({ int result, ignore1, ignore2;					      \
>      __asm __volatile (LOCK_INSTR "cmpxchgl %4, %2\n\t"			      \
> -		      "jnz 1f\n\t"					      \
> -		      ".subsection 1\n\t"				      \
> -		      ".type _L_robust_lock_%=, @function\n"		      \
> -		      "_L_robust_lock_%=:\n"				      \
> +		      "jz 24f\n"					      \

OK.

>  		      "1:\tlea %2, %%" RDI_LP "\n"			      \
>  		      "2:\tsub $128, %%" RSP_LP "\n"			      \
> +		      ".cfi_adjust_cfa_offset 128\n"	              	      \
>  		      "3:\tcallq __lll_robust_lock_wait\n"		      \
>  		      "4:\tadd $128, %%" RSP_LP "\n"			      \
> -		      "5:\tjmp 24f\n"					      \
> -		      "6:\t.size _L_robust_lock_%=, 6b-1b\n\t"		      \
> -		      ".previous\n"					      \
> -		      LLL_STUB_UNWIND_INFO_5				      \
> +		      ".cfi_adjust_cfa_offset -128\n"	              	      \


OK.

>  		      "24:"						      \
>  		      : "=S" (ignore1), "=D" (ignore2), "=m" (futex),	      \
>  			"=a" (result)					      \
> @@ -361,18 +346,13 @@ LLL_STUB_UNWIND_INFO_END
>    (void)								      \
>      ({ int ignore1, ignore2, ignore3;					      \
>         __asm __volatile (LOCK_INSTR "cmpxchgl %4, %2\n\t"		      \
> -			 "jnz 1f\n\t"					      \
> -			 ".subsection 1\n\t"				      \
> -			 ".type _L_cond_lock_%=, @function\n"		      \
> -			 "_L_cond_lock_%=:\n"				      \
> +			 "jz 24f\n"					      \

OK.

>  			 "1:\tlea %2, %%" RDI_LP "\n"			      \
>  			 "2:\tsub $128, %%" RSP_LP "\n"			      \
> +		         ".cfi_adjust_cfa_offset 128\n"	              	      \
>  			 "3:\tcallq __lll_lock_wait\n"			      \
>  			 "4:\tadd $128, %%" RSP_LP "\n"			      \
> -			 "5:\tjmp 24f\n"				      \
> -			 "6:\t.size _L_cond_lock_%=, 6b-1b\n\t"		      \
> -			 ".previous\n"					      \
> -			 LLL_STUB_UNWIND_INFO_5				      \
> +		         ".cfi_adjust_cfa_offset -128\n"	      	      \

OK.

>  			 "24:"						      \
>  			 : "=S" (ignore1), "=D" (ignore2), "=m" (futex),      \
>  			   "=a" (ignore3)				      \
> @@ -383,18 +363,13 @@ LLL_STUB_UNWIND_INFO_END
>  #define lll_robust_cond_lock(futex, id, private) \
>    ({ int result, ignore1, ignore2;					      \
>      __asm __volatile (LOCK_INSTR "cmpxchgl %4, %2\n\t"			      \
> -		      "jnz 1f\n\t"					      \
> -		      ".subsection 1\n\t"				      \
> -		      ".type _L_robust_cond_lock_%=, @function\n"	      \
> -		      "_L_robust_cond_lock_%=:\n"			      \
> +		      "jz 24f\n"					      \

OK.

>  		      "1:\tlea %2, %%" RDI_LP "\n"			      \
>  		      "2:\tsub $128, %%" RSP_LP "\n"			      \
> +		      ".cfi_adjust_cfa_offset 128\n"	              	      \
>  		      "3:\tcallq __lll_robust_lock_wait\n"		      \
>  		      "4:\tadd $128, %%" RSP_LP "\n"			      \
> -		      "5:\tjmp 24f\n"					      \
> -		      "6:\t.size _L_robust_cond_lock_%=, 6b-1b\n\t"	      \
> -		      ".previous\n"					      \
> -		      LLL_STUB_UNWIND_INFO_5				      \
> +		      ".cfi_adjust_cfa_offset -128\n"	              	      \

OK.

>  		      "24:"						      \
>  		      : "=S" (ignore1), "=D" (ignore2), "=m" (futex),	      \
>  			"=a" (result)					      \
> @@ -406,19 +381,14 @@ LLL_STUB_UNWIND_INFO_END
>  #define lll_timedlock(futex, timeout, private) \
>    ({ int result, ignore1, ignore2, ignore3;				      \
>       __asm __volatile (LOCK_INSTR "cmpxchgl %1, %4\n\t"			      \
> -		       "jnz 1f\n\t"					      \
> -		       ".subsection 1\n\t"				      \
> -		       ".type _L_timedlock_%=, @function\n"		      \
> -		       "_L_timedlock_%=:\n"				      \
> +		       "jz 24f\n"					      \

OK.

>  		       "1:\tlea %4, %%" RDI_LP "\n"			      \
>  		       "0:\tmov %8, %%" RDX_LP "\n"			      \
>  		       "2:\tsub $128, %%" RSP_LP "\n"			      \
> +		       ".cfi_adjust_cfa_offset 128\n"	              	      \
>  		       "3:\tcallq __lll_timedlock_wait\n"		      \
>  		       "4:\tadd $128, %%" RSP_LP "\n"			      \
> -		       "5:\tjmp 24f\n"					      \
> -		       "6:\t.size _L_timedlock_%=, 6b-1b\n\t"		      \
> -		       ".previous\n"					      \
> -		       LLL_STUB_UNWIND_INFO_6				      \
> +		       ".cfi_adjust_cfa_offset -128\n"	              	      \

OK.

>  		       "24:"						      \
>  		       : "=a" (result), "=D" (ignore1), "=S" (ignore2),	      \
>  			 "=&d" (ignore3), "=m" (futex)			      \
> @@ -437,19 +407,14 @@ extern int __lll_timedlock_elision (int *futex, short *adapt_count,
>  #define lll_robust_timedlock(futex, timeout, id, private) \
>    ({ int result, ignore1, ignore2, ignore3;				      \
>       __asm __volatile (LOCK_INSTR "cmpxchgl %1, %4\n\t"			      \
> -		       "jnz 1f\n\t"					      \
> -		       ".subsection 1\n\t"				      \
> -		       ".type _L_robust_timedlock_%=, @function\n"	      \
> -		       "_L_robust_timedlock_%=:\n"			      \
> +		       "jz 24f\n\t"					      \

OK.

>  		       "1:\tlea %4, %%" RDI_LP "\n"			      \
>  		       "0:\tmov %8, %%" RDX_LP "\n"			      \
>  		       "2:\tsub $128, %%" RSP_LP "\n"			      \
> +		       ".cfi_adjust_cfa_offset 128\n"	              	      \
>  		       "3:\tcallq __lll_robust_timedlock_wait\n"	      \
>  		       "4:\tadd $128, %%" RSP_LP "\n"			      \
> -		       "5:\tjmp 24f\n"					      \
> -		       "6:\t.size _L_robust_timedlock_%=, 6b-1b\n\t"	      \
> -		       ".previous\n"					      \
> -		       LLL_STUB_UNWIND_INFO_6				      \
> +		       ".cfi_adjust_cfa_offset -128\n"	              	      \

OK.

>  		       "24:"						      \
>  		       : "=a" (result), "=D" (ignore1), "=S" (ignore2),       \
>  			 "=&d" (ignore3), "=m" (futex)			      \
> @@ -460,7 +425,7 @@ extern int __lll_timedlock_elision (int *futex, short *adapt_count,
>  
>  #if defined NOT_IN_libc || defined UP
>  # define __lll_unlock_asm_start LOCK_INSTR "decl %0\n\t"		      \
> -				"jne 1f\n\t"
> +				"je 24f\n\t"

OK.

>  #else
>  # define __lll_unlock_asm_start "cmpl $0, __libc_multiple_threads(%%rip)\n\t" \
>  				"je 0f\n\t"				      \
> @@ -468,7 +433,7 @@ extern int __lll_timedlock_elision (int *futex, short *adapt_count,
>  				"jne 1f\n\t"				      \
>  				"jmp 24f\n\t"				      \
>  				"0:\tdecl %0\n\t"			      \
> -				"jne 1f\n\t"
> +				"je 24f\n\t"

OK.

>  #endif
>  
>  #define lll_unlock(futex, private) \
> @@ -476,34 +441,24 @@ extern int __lll_timedlock_elision (int *futex, short *adapt_count,
>      ({ int ignore;							      \
>         if (__builtin_constant_p (private) && (private) == LLL_PRIVATE)	      \
>  	 __asm __volatile (__lll_unlock_asm_start			      \
> -			   ".subsection 1\n\t"				      \
> -			   ".type _L_unlock_%=, @function\n"		      \
> -			   "_L_unlock_%=:\n"				      \
>  			   "1:\tlea %0, %%" RDI_LP "\n"			      \
>  			   "2:\tsub $128, %%" RSP_LP "\n"		      \

Same question regarding refactoring of __lll_unlock_asm_start as I had for
__lll_lock_asm_start.

> +		           ".cfi_adjust_cfa_offset 128\n"	       	      \
>  			   "3:\tcallq __lll_unlock_wake_private\n"	      \
>  			   "4:\tadd $128, %%" RSP_LP "\n"		      \
> -			   "5:\tjmp 24f\n"				      \
> -			   "6:\t.size _L_unlock_%=, 6b-1b\n\t"		      \
> -			   ".previous\n"				      \
> -			   LLL_STUB_UNWIND_INFO_5			      \
> +		           ".cfi_adjust_cfa_offset -128\n"	       	      \

OK.

>  			   "24:"					      \
>  			   : "=m" (futex), "=&D" (ignore)		      \
>  			   : "m" (futex)				      \
>  			   : "ax", "cx", "r11", "cc", "memory");	      \
>         else								      \
>  	 __asm __volatile (__lll_unlock_asm_start			      \
> -			   ".subsection 1\n\t"				      \
> -			   ".type _L_unlock_%=, @function\n"		      \
> -			   "_L_unlock_%=:\n"				      \
>  			   "1:\tlea %0, %%" RDI_LP "\n"			      \
>  			   "2:\tsub $128, %%" RSP_LP "\n"		      \
> +		           ".cfi_adjust_cfa_offset 128\n"	       	      \

OK.

>  			   "3:\tcallq __lll_unlock_wake\n"		      \
>  			   "4:\tadd $128, %%" RSP_LP "\n"		      \
> -			   "5:\tjmp 24f\n"				      \
> -			   "6:\t.size _L_unlock_%=, 6b-1b\n\t"		      \
> -			   ".previous\n"				      \
> -			   LLL_STUB_UNWIND_INFO_5			      \
> +		           ".cfi_adjust_cfa_offset -128\n"	       	      \

OK.

>  			   "24:"					      \
>  			   : "=m" (futex), "=&D" (ignore)		      \
>  			   : "m" (futex), "S" (private)			      \
> @@ -515,18 +470,13 @@ extern int __lll_timedlock_elision (int *futex, short *adapt_count,
>      {									      \
>        int ignore;							      \
>        __asm __volatile (LOCK_INSTR "andl %2, %0\n\t"			      \
> -			"jne 1f\n\t"					      \
> -			".subsection 1\n\t"				      \
> -			".type _L_robust_unlock_%=, @function\n"	      \
> -			"_L_robust_unlock_%=:\n"			      \
> +			"je 24f\n\t"					      \

OK.

>  			"1:\tlea %0, %%" RDI_LP "\n"			      \
>  			"2:\tsub $128, %%" RSP_LP "\n"			      \
> +		        ".cfi_adjust_cfa_offset 128\n"	  	     	      \

OK.

>  			"3:\tcallq __lll_unlock_wake\n"			      \
>  			"4:\tadd $128, %%" RSP_LP "\n"			      \
> -			"5:\tjmp 24f\n"					      \
> -			"6:\t.size _L_robust_unlock_%=, 6b-1b\n\t"	      \
> -			".previous\n"					      \
> -			LLL_STUB_UNWIND_INFO_5				      \
> +		        ".cfi_adjust_cfa_offset -128\n"	  	     	      \

OK.

>  			"24:"						      \
>  			: "=m" (futex), "=&D" (ignore)			      \
>  			: "i" (FUTEX_WAITERS), "m" (futex),		      \
> 

OK, caught all 10 uses of the out-of-line sections for x86-64.

Post a final version with or without refactoring in x86-64 lowlevellock.h,
your choice.

Please make sure you respond to #1, #2, and #3 above.

Cheers,
Carlos.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]