This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: LD_HWCAP_MASK failure with tst-env-setuid

From: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>
To: Siddhesh Poyarekar <siddhesh at gotplt dot org>, "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>
Cc: Szabolcs Nagy <szabolcs dot nagy at arm dot com>
Date: Mon, 22 May 2017 10:18:39 -0300
Subject: Re: LD_HWCAP_MASK failure with tst-env-setuid
Authentication-results: sourceware.org; auth=none
References: <0ba0c258-1507-3ef4-d981-c034de61dc6f@gotplt.org>

On 19/05/2017 15:04, Siddhesh Poyarekar wrote:
> Adhemerval,
> 
> I tried a bunch of things with the LD_HWCAP_MASK and tst-env-setuid and
> other programs and my current conclusion is that it may be due to a
> stale tst-env-setuid binary.  I've attached a long form description of
> things I tried for you or others to poke holes into, but can you confirm
> if a clean build and test run also fails similarly for you?

Hi Siddhesh, unfortunately the test is still failing on my x86_64 system
with LD_HWCAP_MASK=0xffffffff.  I used you latest tunable patchset [1]
applies on top of master (402bf0695218bbe290418b9486b1dd5fe284d903) and
configure with:

--host=x86_64-linux-gnu --build=x86_64-linux-gnu --enable-add-ons=libidn
--without-selinux --enable-stackguard-randomization --enable-obsolete-rpc
--enable-systemtap --enable-multi-arch --enable-lock-elision --enable-tunables

However I am only seeing this issue on x86_64, aarch64 does not bail out
with 'cannot create capability list: Cannot allocate memory'.

And using you analysis I tried to install the built glibc on a sysroot
and neither the 'bin/true' or the 'tst-env-setuid' failed with 
LD_HWCAP_MASK=0xffffffff.  So I think for BZ#21391 indeed fixed and your
suggestion about installed glibc messing up with the testing still
worries me.  I think what might be happening in fact is static linked
binaries are still relying on ld.so.cache on some internal calculation,
which I think it is not the intended behaviour.  I will try spend some
time figuring out why this is still fails on my system.

[1] https://sourceware.org/ml/libc-alpha/2017-05/msg00570.html 

> 
> Here's what I did:
> 
> 1. To begin with, I simply ran /bin/true with LD_HWCAP_MASK set:
> 
> LD_HWCAP_MASK=0xffffffff /bin/true
> 
> and sure enough, on one of my boxes it failed with the ENOMEM and on
> another it took a good 5-6 seconds before finishing.  This confirmed
> that the issue has been long-standing but was never really noticed.  At
> this point I was going with the assumption that this was a generic bug
> and did not bother testing aarch64.
> 
> 2. Now I tried running elf/ld.so under a debugger and was able to see
> the delay, but I was simply unable to break at the point of the delay or
> failure.  I could not understand at that point what was going on, so I
> moved on to something else
> 
> 3. Now I ran /bin/true with testrun.sh and the LD_HWCAP_MASK envvar set
> and could see the delay.  I tried attaching to elf/ld.so during that
> delay and once again it seemed to be in arbitrary places and I could not
> figure out what was going on.
> 
> 4. I ran perf and found the place in _dl_important_hwcaps where the
> program spent the most time.  I put a bunch of _dl_debug_printf's all
> over the place and oddly the printfs near the hotspot never even got
> invoked, the function was returning much before that.
> 
> 5. And then my Alexander Graham Bell moment happened, where I
> accidentally ran elf/ld.so directly instead of from within testrun.sh
> and the program succeeded immediately, no more delay.  Likewise on the
> other box, running the built elf/ld.so directly no longer showed the
> ENOMEM failure.
> 
> 6. Then I formed the hypothesis that using the old glibc from the system
> was to blame and that trunk glibc was working fine.  This fit in with
> all of the failures perfectly because all of them involved execution of
> a shell or another intermediary program using the system dynamic linker
> and that is what was failing, not the test.  gdb could not break at that
> point because the delay was in the shell it had invoked to start the
> program; the program had not even started.
> 
> I decided to test this by doing a git bisect.
> 
> 7. The bisect led to the fix for pr#21391 that HJ Lu pushed, which
> seemed to have stopped the delays and ENOMEMs in their tracks.  This led
> me to conclude that the issue is specific to x86 and does not affect
> aarch64.  I tested that hypothesis using my mustang aarch64 machine and
> sure enough, it succeeded all of the tests that x86 failed.
> 
> So to conclude, the only way that tst-env-setuid would have failed for
> you in this case was if it was stale i.e. failed to rebuild somehow.
> Hence my request to test again with a clean build.

Follow-Ups:
- Re: LD_HWCAP_MASK failure with tst-env-setuid
  - From: Siddhesh Poyarekar

References:
- LD_HWCAP_MASK failure with tst-env-setuid
  - From: Siddhesh Poyarekar

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]