This is the mail archive of the newlib@sourceware.org mailing list for the newlib project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v2] Add __pure2 to __locale_ctype_ptr(_l)


Corinna Vinschen wrote: 
> Wilco Dijkstra wrote:

> > And it works with -O2 if you split off the p++ in the increment part of the for.
>
> No, it doesn't.  I retried with your style of for loop, but there's
> simply no difference for me.  -O2, -O3, pure/ not-pure, with f++ split
> off or not, it's always taking the same time on average.

That's odd - maybe pure2 doesn't get correctly defined in your environment. I get this
using your unchanged benchmark with -O3 - it clearly lifts the call:

        ldrb    w19, [x20]
        add     x20, x20, 1
        cbz     w19, .L3
        stp     x22, x23, [sp, 40]
        bl      __locale_ctype_ptr
        adrp    x23, .LC0
        mov     x22, x0
        add     x23, x23, :lo12:.LC0
        .p2align 3
.L4:
        add     x19, x22, x19, uxtb
        ldrb    w0, [x19, 1]
        tbnz    x0, 4, .L20
        ldrb    w19, [x20], 1
        cbnz    w19, .L4

What is the disassembly of your version?

>> No this is certainly not architecture dependent. The ctype implementation used to
>> be fast, but it is slow now - changes made to ctype last year caused it.
>
> I was talking about the above observation.  The changes to the locale
> stuff were necessary to support POSIX.1-2008 locale objects.  If you
> think the implementation has flaws, please provide patches.

The ctype implementation certainly can be improved further. However adding
pure2 fixes the major slowdown and has similar performance as GLIBC again,
so that's the most important fix for now.

Wilco

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]