This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: LD_HWCAP_MASK failure with tst-env-setuid


On 19/05/2017 15:04, Siddhesh Poyarekar wrote:
> Adhemerval,
> 
> I tried a bunch of things with the LD_HWCAP_MASK and tst-env-setuid and
> other programs and my current conclusion is that it may be due to a
> stale tst-env-setuid binary.  I've attached a long form description of
> things I tried for you or others to poke holes into, but can you confirm
> if a clean build and test run also fails similarly for you?

Hi Siddhesh, unfortunately the test is still failing on my x86_64 system
with LD_HWCAP_MASK=0xffffffff.  I used you latest tunable patchset [1]
applies on top of master (402bf0695218bbe290418b9486b1dd5fe284d903) and
configure with:

--host=x86_64-linux-gnu --build=x86_64-linux-gnu --enable-add-ons=libidn
--without-selinux --enable-stackguard-randomization --enable-obsolete-rpc
--enable-systemtap --enable-multi-arch --enable-lock-elision --enable-tunables

However I am only seeing this issue on x86_64, aarch64 does not bail out
with 'cannot create capability list: Cannot allocate memory'.

And using you analysis I tried to install the built glibc on a sysroot
and neither the 'bin/true' or the 'tst-env-setuid' failed with 
LD_HWCAP_MASK=0xffffffff.  So I think for BZ#21391 indeed fixed and your
suggestion about installed glibc messing up with the testing still
worries me.  I think what might be happening in fact is static linked
binaries are still relying on ld.so.cache on some internal calculation,
which I think it is not the intended behaviour.  I will try spend some
time figuring out why this is still fails on my system.

[1] https://sourceware.org/ml/libc-alpha/2017-05/msg00570.html 

> 
> Here's what I did:
> 
> 1. To begin with, I simply ran /bin/true with LD_HWCAP_MASK set:
> 
> LD_HWCAP_MASK=0xffffffff /bin/true
> 
> and sure enough, on one of my boxes it failed with the ENOMEM and on
> another it took a good 5-6 seconds before finishing.  This confirmed
> that the issue has been long-standing but was never really noticed.  At
> this point I was going with the assumption that this was a generic bug
> and did not bother testing aarch64.
> 
> 2. Now I tried running elf/ld.so under a debugger and was able to see
> the delay, but I was simply unable to break at the point of the delay or
> failure.  I could not understand at that point what was going on, so I
> moved on to something else
> 
> 3. Now I ran /bin/true with testrun.sh and the LD_HWCAP_MASK envvar set
> and could see the delay.  I tried attaching to elf/ld.so during that
> delay and once again it seemed to be in arbitrary places and I could not
> figure out what was going on.
> 
> 4. I ran perf and found the place in _dl_important_hwcaps where the
> program spent the most time.  I put a bunch of _dl_debug_printf's all
> over the place and oddly the printfs near the hotspot never even got
> invoked, the function was returning much before that.
> 
> 5. And then my Alexander Graham Bell moment happened, where I
> accidentally ran elf/ld.so directly instead of from within testrun.sh
> and the program succeeded immediately, no more delay.  Likewise on the
> other box, running the built elf/ld.so directly no longer showed the
> ENOMEM failure.
> 
> 6. Then I formed the hypothesis that using the old glibc from the system
> was to blame and that trunk glibc was working fine.  This fit in with
> all of the failures perfectly because all of them involved execution of
> a shell or another intermediary program using the system dynamic linker
> and that is what was failing, not the test.  gdb could not break at that
> point because the delay was in the shell it had invoked to start the
> program; the program had not even started.
> 
> I decided to test this by doing a git bisect.
> 
> 7. The bisect led to the fix for pr#21391 that HJ Lu pushed, which
> seemed to have stopped the delays and ENOMEMs in their tracks.  This led
> me to conclude that the issue is specific to x86 and does not affect
> aarch64.  I tested that hypothesis using my mustang aarch64 machine and
> sure enough, it succeeded all of the tests that x86 failed.
> 
> So to conclude, the only way that tst-env-setuid would have failed for
> you in this case was if it was stale i.e. failed to rebuild somehow.
> Hence my request to test again with a clean build.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]