This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 1/25] Remove nested functions: crypt/md5-crypt.c


On Tue, May 20, 2014 at 7:27 PM, Siddhesh Poyarekar <siddhesh@redhat.com> wrote:
> On Tue, May 20, 2014 at 05:20:31PM +0200, OndÅej BÃlka wrote:
>> Yes I wanted to write that benchmarking this change is probably
>> pointless as gcc should optimize them in same way (if there is
>> consistently difference in one way we could report that as gcc regression.)
>>
>> Also perf data show that its a cold function so you should improve
>> readability not performance (measuring difference is possible but you
>> need to run that for few days and wait until standard deviation drops
>> sufficiently to get statistically significant result).
>>
>> About only way this could make difference is if when it gets inlined in
>> one way and does not in other.
>> That is out of scope of microbenchmarks, instruction cache misses easily
>> could make a function slower in wild but you would measure that inlined
>> function is faster when you would call it in tight loop.
>
> Fair enough.  I'll push the patch if Konstantin shows that the
> generated code is not drastically different.

Just a data point; I am not sure if it proves anything.
If I add __attribute__ ((noinline)) to the definition of b64_from_24bit,
the two binaries become more similar.

The non-nested variant is a bit longer as it needs 6 instructions to
pass the parameters:

    15a9:       0f b6 8d 87 fe ff ff    movzbl -0x179(%rbp),%ecx
      15b0:       0f b6 95 81 fe ff ff    movzbl -0x17f(%rbp),%edx
      15b7:       41 b9 04 00 00 00       mov    $0x4,%r9d
      15bd:       44 0f b6 85 8d fe ff    movzbl -0x173(%rbp),%r8d
      15c4:       ff
      15c5:       48 89 de                mov    %rbx,%rsi
      15c8:       4c 89 f7                mov    %r14,%rdi
      15cb:       e8 40 fb ff ff          callq  1110 <b64_from_24bit>

The nested variant needs 5:
     15bf:       0f b6 95 7e fe ff ff    movzbl -0x182(%rbp),%edx
      15c6:       0f b6 b5 78 fe ff ff    movzbl -0x188(%rbp),%esi
      15cd:       49 89 da                mov    %rbx,%r10
      15d0:       0f b6 bd 72 fe ff ff    movzbl -0x18e(%rbp),%edi
      15d7:       b9 04 00 00 00          mov    $0x4,%ecx
      15dc:       e8 2f fb ff ff          callq  1110
<b64_from_24bit.7858>

With inlining (that happens for me only in the non-nested variant)
this is irrelevant.

>
> Siddhesh


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]