This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH 00/27] S390: Optimize string, wcsmbs and memory functions.
- From: Stefan Liebler <stli at linux dot vnet dot ibm dot com>
- To: libc-alpha at sourceware dot org
- Date: Fri, 21 Aug 2015 10:17:13 +0200
- Subject: Re: [PATCH 00/27] S390: Optimize string, wcsmbs and memory functions.
- Authentication-results: sourceware.org; auth=none
- References: <1435930721-27922-1-git-send-email-stli at linux dot vnet dot ibm dot com> <mpsp8u$e1j$2 at ger dot gmane dot org> <20150811111634 dot GB21448 at domone> <mqibb9$sie$1 at ger dot gmane dot org> <20150813161454 dot GA29304 at domone>
On 08/13/2015 06:14 PM, OndÅej BÃlka wrote:
On Thu, Aug 13, 2015 at 04:58:48PM +0200, Stefan Liebler wrote:
On 08/11/2015 01:16 PM, OndÅej BÃlka wrote:
On Wed, Aug 05, 2015 at 12:41:34PM +0200, Stefan Liebler wrote:
ping after release 2.22.
ok to commit?
Regarding to the wide-string enhancements for tests/benchtests
From performance side I don't have more comments, it looks like nice
instruction set. However I couldn't review this on correctness side so
who could check assembly for that?
Thanks for your review.
Andreas Krebbel will do this.
Is the enhancement of the tests/benchtests okay?
As benchmarks I will shortly post link to next iteration of dryrun
framework on this list. It allows to collect call traces and replay
functions with same arguments. While not as accurate directly measuring
timing information of new implementations it gives good approximation of
what improvement you will actually get. So could you after I post it
also run it?
Yes, of course. I'll give it a try, but I'm not allowed to post the
results. Are there any traces of workload available?
And could you at least tell what speedup you got or not?
I didn't publish workloads for all functions yet, only some special
cases. I need to write utilities to limit record size, now it generates
gigabyte large files for strcmp after short while as strcmp is called
that much. Also I didn't add conversions between little/big endian and
alignments could be different.
I would prefer if you did recording for example compiling glibc(or
anything else that interests you), you need to only write following
# end shell
I recorded the building of binutils/glibc and produced some GB of record
data. Afterwards I called bench_<function> via testrun.sh from
no-vector/vector glibc-build as mentioned in your separate dryrun-post.
Comparison showed speedup and no function was slower with the