This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: bzero/bcopy/bcmp/mempcpy (was: Improve strncpy performance further)
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Wilco Dijkstra <wdijkstr at arm dot com>
- Cc: 'Roland McGrath' <roland at hack dot frob dot com>, libc-alpha at sourceware dot org
- Date: Wed, 11 Feb 2015 14:06:56 +0100
- Subject: Re: bzero/bcopy/bcmp/mempcpy (was: Improve strncpy performance further)
- Authentication-results: sourceware.org; auth=none
- References: <20150108185812 dot 285782C3BF6 at topped-with-meat dot com> <001901d02c0d$43cf9920$cb6ecb60$ at com> <20150109191632 dot 694692C3C1F at topped-with-meat dot com> <001a01d02dc9$bd6f0370$384d0a50$ at com> <20150113191449 dot AD91B2C39DC at topped-with-meat dot com> <001e01d03003$f67b8670$e3729350$ at com> <20150114193244 dot 44C022C39DB at topped-with-meat dot com> <002101d030da$c05f76f0$411e64d0$ at com> <20150131203619 dot GA13121 at domone dot leoexpresswifi dot com> <002b01d04097$ec2c9b10$c485d130$ at com>
On Wed, Feb 04, 2015 at 04:30:43PM -0000, Wilco Dijkstra wrote:
> > > the return value at the start of memcpy so that mempcpy can jump past it.
> > > This means 1 extra instruction in every memcpy invocation plus an extra
> > > branch for mempcpy.
> >
> > That is false. You need to copy starting memcpy fragment until you set
> > return value and then jump which gives no overhead to memcpy.
>
> That's not how memcpy implementations work. You never set the return value
> explicitly, you either don't change the destination register (which on most ABIs
> also is the return value) or save/restore it on targets with few registers.
> Additionally for small/medium copies you use the destination (and return value)
> unchanged, so to support a different return value you need an extra instruction
> to make a copy of the destination ...
>
No, my description is quite explicit. You take memcpy implementation and
look at first instructions such that there is no read/write to return
register/memory after reaching that instruction.
Now for mempcpy you take memcpy as template and clone it until you reach
instruction corresponding to one described before.
On that position you change return value and jump to corresponding
instruction in memcpy.
It is obvious this does not add extra instruction to memcpy as memcpy is
not changed.