This is the mail archive of the
newlib@sourceware.org
mailing list for the newlib project.
Re: [PATCH v2] Add __pure2 to __locale_ctype_ptr(_l)
Corinna Vinschen wrote:
> Wilco Dijkstra wrote:
> > And it works with -O2 if you split off the p++ in the increment part of the for.
>
> No, it doesn't. I retried with your style of for loop, but there's
> simply no difference for me. -O2, -O3, pure/ not-pure, with f++ split
> off or not, it's always taking the same time on average.
That's odd - maybe pure2 doesn't get correctly defined in your environment. I get this
using your unchanged benchmark with -O3 - it clearly lifts the call:
ldrb w19, [x20]
add x20, x20, 1
cbz w19, .L3
stp x22, x23, [sp, 40]
bl __locale_ctype_ptr
adrp x23, .LC0
mov x22, x0
add x23, x23, :lo12:.LC0
.p2align 3
.L4:
add x19, x22, x19, uxtb
ldrb w0, [x19, 1]
tbnz x0, 4, .L20
ldrb w19, [x20], 1
cbnz w19, .L4
What is the disassembly of your version?
>> No this is certainly not architecture dependent. The ctype implementation used to
>> be fast, but it is slow now - changes made to ctype last year caused it.
>
> I was talking about the above observation. The changes to the locale
> stuff were necessary to support POSIX.1-2008 locale objects. If you
> think the implementation has flaws, please provide patches.
The ctype implementation certainly can be improved further. However adding
pure2 fixes the major slowdown and has similar performance as GLIBC again,
so that's the most important fix for now.
Wilco