This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] S/390: Fix two issues with the IFUNC optimized mem* routines
On Wed, Aug 29, 2012 at 8:45 AM, Andreas Krebbel
<krebbel@linux.vnet.ibm.com> wrote:
> On 29/08/12 16:23, H.J. Lu wrote:
>> On Wed, Aug 29, 2012 at 4:44 AM, Andreas Krebbel
>> <krebbel@linux.vnet.ibm.com> wrote:
>>> On 29/08/12 13:05, Andreas Jaeger wrote:
>>>> On Wednesday, August 29, 2012 12:44:21 Andreas Krebbel wrote:
>>>>> Hi,
>>>>>
>>>>> the attached patch fixes two problems with the S/390 IFUNC
>>>>> optimization of the mem* functions:
>>>>>
>>>>> 1. In the current implementation the resolver functions reside in a
>>>>> different file than the CPU optimized versions. This requires an
>>>>> R_390_RELATIVE runtime relocation to be generated when the resolver
>>>>> returns the function pointers. This caused a bug with GCJ. libgcj
>>>>> calls memcpy via function pointer (R_390_GLOB_DAT). This relocation
>>>>> is resolved at load time of libgcj. The dynamic linker in that case
>>>>> called the memcpy resolver inside Glibc *before* glibc has been
>>>>> relocated causing the resolver to return a bogus value.
>>>>>
>>>>> This perhaps could also be fixed in the dynamic linker by calling the
>>>>> ifunc resolvers only in a second pass over all the relocations?!
>>>>
>>>> Could this also be an issue on other architectures like x86-64? I had a
>>>> few strange bugreports with LD_BIND_NOW=1 in kde that were impossible to
>>>> debug but seemed to involve multiarch functions,
>>>
>>> Not for the Glibc functions I think. The resolver functions for x86_64 use lea to load the
>>> address of the optimized functions. This works without generating runtime relocations.
>>> Another reason is that, according to H.J.Lu, Glibc on x86_64 is always forced to be loaded
>>> first so it wouldn't even be a problem if the resolvers would need runtime relocations.
>>
>> That is not the issue. There are
>>
>> /* It doesn't make sense to send libc-internal memcpy calls through a PLT.
>> The speedup we get from using SSSE3 instruction is likely eaten away
>> by the indirect call in the PLT. */
>> # define libc_hidden_builtin_def(name) \
>> .globl __GI_memcpy; __GI_memcpy = __memcpy_sse2
>>
>> versioned_symbol (libc, __new_memcpy, memcpy, GLIBC_2_14);
>>
>>
>>> However, I think this is a general problem which might very well occur with other shared
>>> objects defining IFUNC optimized routines. Forcing IFUNC resolvers to never generate any
>>> runtime relocations to me appears like a rather non-obvious limitation.
>>
>> There are some limitations. But you can use relative relocations
>> with IFUNC symbols if you fix
>>
>> http://sourceware.org/bugzilla/show_bug.cgi?id=13302
>>
>>> Please see the following example on x86-64. The example works fine after making a1 static:
>>>
>>> a.c:
>>> #include <stdio.h>
>>>
>>> void a (int) __attribute__((ifunc ("resolve_a")));
>>>
>>> void a1 (int i)
>>> {
>>> printf("%d\n", i + 1);
>>> }
>>>
>>> void (*resolve_a (void)) (int)
>>> {
>>> return &a1;
>>> }
>>>
>>> b.c:
>>> extern void a (int);
>>>
>>> void (*ap) (int) = a;
>>>
>>> void
>>> b (int i)
>>> {
>>> ap (i + 1);
>>> }
>>>
>>> main.c:
>>> extern void b (int);
>>>
>>> int
>>> main ()
>>> {
>>> b (1);
>>> }
>>>
>>> gcc -shared -fpic a.c -o liba.so
>>> gcc -shared -fpic b.c -o libb.so
>>>
>>> gcc -o main main.c -L./ -lb -la
>>> export LD_LIBRARY_PATH=./
>>> $ ./main
>>> 3
>>>
>>> gcc -o main main.c -L./ -la -lb
>>> $ ./main
>>> Segmentation fault
>>>
>>
>> This is a bug in your testcase.
>>
>> ---
>> void a (int) __attribute__((ifunc ("resolve_a")));
>>
>> void a1 (int i)
>> {
>> printf("%d\n", i + 1);
>> }
>>
>> void (*resolve_a (void)) (int)
>> {
>> return &a1;
>> }
>> ----
>>
>> For all I know, "a" may wipe your data at run-time.
>
> Not sure what you mean. Could you please elaborate?
>
> Btw. the same happens if you make a1 resolve locally with a version script:
>
> $ cat linkmap
> {
> global:
> a;
> local:
> *;
> };
> $ gcc -shared -fpic a.c -o liba.so -Wl,--version-script,linkmap
> $ gcc -o main main.c -L./ -la -lb
> $ ./main
> Segmentation fault
>
> The point is that if it is not known at compile time that the symbol will resolve locally the compiler generates an GOT
> access which for a DSO cannot be completed at final link. So in that case resolve_a requires a runtime relocation:
>
> 00000000000005b6 <resolve_a>:
> 5b6: 55 push %rbp
> 5b7: 48 89 e5 mov %rsp,%rbp
> 5ba: 48 8b 05 b7 02 20 00 mov 0x2002b7(%rip),%rax # 200878 <_DYNAMIC+0x188>
> 5c1: 5d pop %rbp
> 5c2: c3 retq
> 5c3: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
> 5ca: 00 00 00
> 5cd: 0f 1f 00 nopl (%rax)
>
> Relocation section '.rela.dyn' at offset 0x328 contains 7 entries:
> Offset Info Type Sym. Value Sym. Name + Addend
> 000000200878 000000000008 R_X86_64_RELATIVE 590
>
> If like in the example above the relocs in b are processed before the relocs in a the resolver will return an
> unrelocated function pointer.
>
> $ LD_DEBUG=reloc ./main
> 26400: relocation processing: /lib64/libc.so.6
> 26400: relocation processing: ./libb.so (lazy)
> 26400: relocation processing: ./liba.so (lazy)
> $ ./main
> Segmentation fault
>
> If it is done the other way around so that the relocs in the ifunc resolver are resolved first everything works fine.
>
> $ LD_DEBUG=reloc ./main
> 26417: relocation processing: /lib64/libc.so.6
> 26417: relocation processing: ./liba.so (lazy)
> 26417: relocation processing: ./libb.so (lazy)
> $ ./main
> 3
>
IFUNC symbol is accessed via GOT. An IFUNC function may have
relocatons, but it shouldn't use GOT to get the resolved symbol.
--
H.J.