This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: LD_HWCAP_MASK failure with tst-env-setuid
- From: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>
- To: Siddhesh Poyarekar <siddhesh at gotplt dot org>, "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>
- Cc: Szabolcs Nagy <szabolcs dot nagy at arm dot com>
- Date: Mon, 22 May 2017 10:18:39 -0300
- Subject: Re: LD_HWCAP_MASK failure with tst-env-setuid
- Authentication-results: sourceware.org; auth=none
- References: <0ba0c258-1507-3ef4-d981-c034de61dc6f@gotplt.org>
On 19/05/2017 15:04, Siddhesh Poyarekar wrote:
> Adhemerval,
>
> I tried a bunch of things with the LD_HWCAP_MASK and tst-env-setuid and
> other programs and my current conclusion is that it may be due to a
> stale tst-env-setuid binary. I've attached a long form description of
> things I tried for you or others to poke holes into, but can you confirm
> if a clean build and test run also fails similarly for you?
Hi Siddhesh, unfortunately the test is still failing on my x86_64 system
with LD_HWCAP_MASK=0xffffffff. I used you latest tunable patchset [1]
applies on top of master (402bf0695218bbe290418b9486b1dd5fe284d903) and
configure with:
--host=x86_64-linux-gnu --build=x86_64-linux-gnu --enable-add-ons=libidn
--without-selinux --enable-stackguard-randomization --enable-obsolete-rpc
--enable-systemtap --enable-multi-arch --enable-lock-elision --enable-tunables
However I am only seeing this issue on x86_64, aarch64 does not bail out
with 'cannot create capability list: Cannot allocate memory'.
And using you analysis I tried to install the built glibc on a sysroot
and neither the 'bin/true' or the 'tst-env-setuid' failed with
LD_HWCAP_MASK=0xffffffff. So I think for BZ#21391 indeed fixed and your
suggestion about installed glibc messing up with the testing still
worries me. I think what might be happening in fact is static linked
binaries are still relying on ld.so.cache on some internal calculation,
which I think it is not the intended behaviour. I will try spend some
time figuring out why this is still fails on my system.
[1] https://sourceware.org/ml/libc-alpha/2017-05/msg00570.html
>
> Here's what I did:
>
> 1. To begin with, I simply ran /bin/true with LD_HWCAP_MASK set:
>
> LD_HWCAP_MASK=0xffffffff /bin/true
>
> and sure enough, on one of my boxes it failed with the ENOMEM and on
> another it took a good 5-6 seconds before finishing. This confirmed
> that the issue has been long-standing but was never really noticed. At
> this point I was going with the assumption that this was a generic bug
> and did not bother testing aarch64.
>
> 2. Now I tried running elf/ld.so under a debugger and was able to see
> the delay, but I was simply unable to break at the point of the delay or
> failure. I could not understand at that point what was going on, so I
> moved on to something else
>
> 3. Now I ran /bin/true with testrun.sh and the LD_HWCAP_MASK envvar set
> and could see the delay. I tried attaching to elf/ld.so during that
> delay and once again it seemed to be in arbitrary places and I could not
> figure out what was going on.
>
> 4. I ran perf and found the place in _dl_important_hwcaps where the
> program spent the most time. I put a bunch of _dl_debug_printf's all
> over the place and oddly the printfs near the hotspot never even got
> invoked, the function was returning much before that.
>
> 5. And then my Alexander Graham Bell moment happened, where I
> accidentally ran elf/ld.so directly instead of from within testrun.sh
> and the program succeeded immediately, no more delay. Likewise on the
> other box, running the built elf/ld.so directly no longer showed the
> ENOMEM failure.
>
> 6. Then I formed the hypothesis that using the old glibc from the system
> was to blame and that trunk glibc was working fine. This fit in with
> all of the failures perfectly because all of them involved execution of
> a shell or another intermediary program using the system dynamic linker
> and that is what was failing, not the test. gdb could not break at that
> point because the delay was in the shell it had invoked to start the
> program; the program had not even started.
>
> I decided to test this by doing a git bisect.
>
> 7. The bisect led to the fix for pr#21391 that HJ Lu pushed, which
> seemed to have stopped the delays and ENOMEMs in their tracks. This led
> me to conclude that the issue is specific to x86 and does not affect
> aarch64. I tested that hypothesis using my mustang aarch64 machine and
> sure enough, it succeeded all of the tests that x86 failed.
>
> So to conclude, the only way that tst-env-setuid would have failed for
> you in this case was if it was stale i.e. failed to rebuild somehow.
> Hence my request to test again with a clean build.