This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH 2/2] Improve strcpy: Faster unaligned loads.
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Carlos O'Donell <carlos at redhat dot com>
- Cc: libc-alpha at sourceware dot org
- Date: Wed, 11 Sep 2013 18:35:51 +0200
- Subject: Re: [PATCH 2/2] Improve strcpy: Faster unaligned loads.
- Authentication-results: sourceware.org; auth=none
- References: <20130909153051 dot GA23047 at domone dot kolej dot mff dot cuni dot cz> <20130909161112 dot GB23047 at domone dot kolej dot mff dot cuni dot cz> <522E36A9 dot 8040100 at redhat dot com>
On Mon, Sep 09, 2013 at 04:59:21PM -0400, Carlos O'Donell wrote:
> On 09/09/2013 12:11 PM, OndÅej BÃlka wrote:
> > This is actual implmentation. We use optimized header that makes calls
> > around 50 cycles faster for nehalem and ivy bridge.
> >
> > Currently this improves strcpy, stpcpy, ctrcat I keep old implementation
> > of strncpy/strncat.
> >
> > A header that I use improves speed by 10% on most processors for gcc
> > workload. Separate loops that use ssse3/shifts are needed as this
> > implemenation is slower on large sizes for processors without fast
> > unaligned loads.
> >
> > Results were obtained by following benchmark:
> >
> > http://kam.mff.cuni.cz/~ondra/benchmark_string/strcpy_profile.html
> > http://kam.mff.cuni.cz/~ondra/benchmark_string/strcpy_profile90913.tar.bz2
>
> The benchmark numbers are great. I appreciate you running the various
> tests against the old, new, and sse3 implementations.
>
> Does the glibc microbenchmark show a performance increase also or are
> we still lacking the requisite framework to measure these changes?
>
There are several areas lacking, one not calling function in tight loop
to take effect of branch prediction into account. Numbers from
benchtests tend to be off by large amount due lack of randomization.
For example a strcpy-ssse3.S handles first 16 bytes with following code:
cmpb $0, (%rcx)
jz L(Exit1)
cmpb $0, 1(%rcx)
jz L(Exit2)
cmpb $0, 2(%rcx)
jz L(Exit3)
cmpb $0, 3(%rcx)
jz L(Exit4)
cmpb $0, 4(%rcx)
jz L(Exit5)
cmpb $0, 5(%rcx)
jz L(Exit6)
cmpb $0, 6(%rcx)
jz L(Exit7)
cmpb $0, 7(%rcx)
jz L(Exit8)
...
when size varies then it will degrade performance but benchmarks do not
catch this.
There is other problem how to compare old and new implementations. You
have two tables of results before and after and now you want to compare
them.
This is design problem main use case of benchmarks is comparing
implementations it should be easy to do without having do tedious tasks
like adding functions to makefile, ifunc-impl-list, renaming them,
recompiling libc and running benchmarks.
old
simple_strcpy __strcpy_ssse3 __strcpy_sse2_unaligned __strcpy_sse2
Length 0, alignments in bytes 0/ 0: 10.9062 11.1562 16.6719 15.5312
Length 0, alignments in bytes 0/ 0: 10.0156 10.2969 17.7031 14.7812
Length 0, alignments in bytes 0/ 0: 9.8125 10.3906 16.5781 14.5938
Length 0, alignments in bytes 0/ 0: 9.67188 9.25 20.4531 14.7812
Length 1, alignments in bytes 0/ 0: 11.375 11.0469 15.625 19.7812
Length 1, alignments in bytes 0/ 0: 13.8281 10.3906 15.875 15.2969
Length 1, alignments in bytes 0/ 1: 12.7969 9.96875 15.875 15.7812
Length 1, alignments in bytes 1/ 0: 17.2812 11.0938 15.9219 16.625
Length 2, alignments in bytes 0/ 0: 14.6406 13.9219 16.9062 20.6875
Length 2, alignments in bytes 0/ 0: 15.7812 13.7812 16.5312 19.5469
Length 2, alignments in bytes 0/ 2: 14.6875 13.375 16 19.3125
Length 2, alignments in bytes 2/ 0: 18.1875 16.4219 15.5781 20.2031
Length 3, alignments in bytes 0/ 0: 17.9375 13.4062 16 22.4844
Length 3, alignments in bytes 0/ 0: 18.9844 12.9375 16.375 20.7812
Length 3, alignments in bytes 0/ 3: 16.625 12.375 16.7188 20.6719
Length 3, alignments in bytes 3/ 0: 18.2812 12.125 16.2344 23.1406
Length 4, alignments in bytes 0/ 0: 18.8906 16.8125 17.5312 25.5469
Length 4, alignments in bytes 0/ 0: 20.0625 16.4375 17.4219 23.7969
Length 4, alignments in bytes 0/ 4: 22.4375 16.1875 17.2812 24.125
Length 4, alignments in bytes 4/ 0: 21.4844 15.9219 16.1406 30.7969
Length 5, alignments in bytes 0/ 0: 20.3906 19.7812 18.3281 27.7656
Length 5, alignments in bytes 0/ 0: 22.8125 17.4688 20.7188 26.125
Length 5, alignments in bytes 0/ 5: 23.3281 17.7969 20.9219 25.6406
Length 5, alignments in bytes 5/ 0: 22.6094 18.0781 16.3906 31.1094
Length 6, alignments in bytes 0/ 0: 23.4688 19.7969 20.9688 32.1094
Length 6, alignments in bytes 0/ 0: 24.3281 18.7031 20.9688 28.7031
Length 6, alignments in bytes 0/ 6: 24.8906 22.2344 18 29.2812
Length 6, alignments in bytes 6/ 0: 24.6406 19.3594 20.7812 33.5312
Length 7, alignments in bytes 0/ 0: 25.3594 22.3438 16.5312 30.6406
Length 7, alignments in bytes 0/ 0: 27.1562 19.9219 19.7344 28.7188
Length 7, alignments in bytes 0/ 7: 26.6719 19.7344 16.1406 29.0469
Length 7, alignments in bytes 7/ 0: 27.1094 19.3125 15.625 34.3281
Length 8, alignments in bytes 0/ 0: 26.2031 23.5156 16.6719 21.0156
Length 8, alignments in bytes 0/ 0: 27.6719 23.1406 16.2969 19.6406
Length 8, alignments in bytes 0/ 0: 28.2031 21.9531 17.8438 19.2656
Length 8, alignments in bytes 0/ 0: 28 21.9531 19.4062 19.2188
Length 9, alignments in bytes 0/ 0: 28.25 23.9375 17.8906 21.3906
Length 9, alignments in bytes 0/ 0: 30.2656 24.0781 20.5938 21.3906
Length 9, alignments in bytes 0/ 1: 30.125 23.8906 18.7031 20.2969
Length 9, alignments in bytes 1/ 0: 29.375 23.8906 17.7969 44.5312
Length 10, alignments in bytes 0/ 0: 31.3125 25.5938 17.8438 33
Length 10, alignments in bytes 0/ 0: 30.4062 25.5 18.4219 24.8438
Length 10, alignments in bytes 0/ 2: 31.6406 25.7344 17.9375 23.4219
Length 10, alignments in bytes 2/ 0: 30.3125 25.0312 18.7031 46.8906
Length 11, alignments in bytes 0/ 0: 33.7656 27.5312 18.375 27.1562
Length 11, alignments in bytes 0/ 0: 33.3438 27.1562 18.8438 25.9844
Length 11, alignments in bytes 0/ 3: 34.0938 27.2031 19.75 32.5938
Length 11, alignments in bytes 3/ 0: 34.0938 26.2969 17.375 48.1094
Length 12, alignments in bytes 0/ 0: 35.3281 30.1719 16.6719 32.9219
Length 12, alignments in bytes 0/ 0: 35.125 29.9844 17.4688 29.1875
Length 12, alignments in bytes 0/ 4: 58.6562 31.1719 17.625 28.6719
Length 12, alignments in bytes 4/ 0: 34.4219 30.8906 17.5312 34.1406
Length 13, alignments in bytes 0/ 0: 37.8281 30.8438 17.6094 31.5
Length 13, alignments in bytes 0/ 0: 36.4531 31.125 17.9531 30.3125
Length 13, alignments in bytes 0/ 5: 37.3594 30.8281 17.8438 30.2188
Length 13, alignments in bytes 5/ 0: 36.3125 30.7969 18.4062 37.1719
Length 14, alignments in bytes 0/ 0: 40.3281 34.6562 17.2344 38.0625
Length 14, alignments in bytes 0/ 0: 38.7188 33.7656 18.3281 33.8125
Length 14, alignments in bytes 0/ 6: 37.7344 34.3906 17.4688 33.9531
Length 14, alignments in bytes 6/ 0: 38.4375 33.9531 18.4219 47.2188
Length 15, alignments in bytes 0/ 0: 72.0625 38.9531 16.4844 35.9375
Length 15, alignments in bytes 0/ 0: 41.7031 34.1875 16.1406 34.1406
Length 15, alignments in bytes 0/ 7: 40.9375 34.1875 16.625 33.375
Length 15, alignments in bytes 7/ 0: 41.4219 34.3281 16.5781 40.375
Length 16, alignments in bytes 0/ 0: 42.8281 43.9688 18.7031 25.6406
Length 16, alignments in bytes 7/ 2: 41.8906 51.5625 18 40.3281
Length 32, alignments in bytes 0/ 0: 111.453 61.625 24.4219 33.7188
Length 32, alignments in bytes 6/ 4: 109.797 65.7812 30.6406 50.5312
Length 64, alignments in bytes 0/ 0: 168.453 54.6875 28.4844 77.25
Length 64, alignments in bytes 5/ 6: 188.125 114.562 28.75 64.7969
Length 128, alignments in bytes 0/ 0: 327.062 73.0938 50.0938 79.5156
Length 128, alignments in bytes 4/ 0: 326.406 103.078 48.3125 101.391
Length 256, alignments in bytes 0/ 0: 642.516 88.6875 65.5469 240.547
Length 256, alignments in bytes 3/ 2: 642.469 118.109 81.7031 163.25
Length 512, alignments in bytes 0/ 0: 1271.03 121.078 105.812 271.234
Length 512, alignments in bytes 2/ 4: 1307.97 172.875 105.969 298.734
Length 1024, alignments in bytes 0/ 0: 2530.12 184.406 167.969 528.172
Length 1024, alignments in bytes 1/ 6: 2684.86 254.672 170.094 593.922
Length 16, alignments in bytes 1/ 2: 45.1406 50.4375 18.6094 46.2812
Length 16, alignments in bytes 2/ 1: 42.2188 47.375 17.5625 47.0312
Length 16, alignments in bytes 1/ 1: 42.7812 51.6094 16.625 45.9062
Length 16, alignments in bytes 1/ 1: 43.2969 42.9219 16.5781 45.1875
Length 32, alignments in bytes 2/ 4: 125 72.7656 28.6094 54.2188
Length 32, alignments in bytes 4/ 2: 110.547 63.9375 27.1562 50.7656
Length 32, alignments in bytes 2/ 2: 109.359 52.6562 28.0938 52.1719
Length 32, alignments in bytes 2/ 2: 108.984 51.9062 27.5312 51.4688
Length 64, alignments in bytes 3/ 6: 182.375 81.75 30.0312 69.5625
Length 64, alignments in bytes 6/ 3: 263.078 79.2812 29.75 66.6719
Length 64, alignments in bytes 3/ 3: 169.625 59.9219 28.0469 67.4375
Length 64, alignments in bytes 3/ 3: 263.312 57.9375 28.3438 66.5781
Length 128, alignments in bytes 4/ 0: 324.844 101.672 48.2188 97.3125
Length 128, alignments in bytes 0/ 4: 345.375 97.6094 65.6875 82.0156
Length 128, alignments in bytes 4/ 4: 538.766 80.1406 48.4062 100.109
Length 128, alignments in bytes 4/ 4: 327.578 78.2969 64.7969 98.0312
Length 256, alignments in bytes 5/ 2: 1040.11 126.641 66.3438 163.812
Length 256, alignments in bytes 2/ 5: 667.625 125 71.6719 169.203
Length 256, alignments in bytes 5/ 5: 641.609 99.3594 81.5156 260.047
Length 256, alignments in bytes 5/ 5: 640.562 96.4219 70.4531 160.031
Length 512, alignments in bytes 6/ 4: 1270.94 169.25 107.094 292.078
Length 512, alignments in bytes 4/ 6: 1308.86 174.625 120.422 296.234
Length 512, alignments in bytes 6/ 6: 1270.33 134.203 97.1875 289.094
Length 512, alignments in bytes 6/ 6: 1270.38 129.953 102.328 472.172
Length 1024, alignments in bytes 7/ 6: 2530.78 254.859 168.594 543.109
Length 1024, alignments in bytes 6/ 7: 2709.52 267.703 166.172 964.609
Length 1024, alignments in bytes 7/ 7: 2529.83 194.406 169.719 883.391
Length 1024, alignments in bytes 7/ 7: 2529.64 192.812 159.281 539.609
new
simple_strcpy __strcpy_ssse3 __strcpy_sse2_unaligned __strcpy_sse2
Length 0, alignments in bytes 0/ 0: 10.8125 22.1406 21.8125 19.2656
Length 0, alignments in bytes 0/ 0: 9.96875 21.0156 21.0156 17.5156
Length 0, alignments in bytes 0/ 0: 10.4844 20.9688 25.0781 14.5938
Length 0, alignments in bytes 0/ 0: 9.96875 20.6406 25.125 15.4531
Length 1, alignments in bytes 0/ 0: 17.6094 27.3438 27.0156 20.4062
Length 1, alignments in bytes 0/ 0: 13.5156 25.2656 24.5938 16.2969
Length 1, alignments in bytes 0/ 1: 16.9062 24.5 25.4062 19.6875
Length 1, alignments in bytes 1/ 0: 14.8281 24.5625 25.125 20.5469
Length 2, alignments in bytes 0/ 0: 14.9219 24.3594 24.3594 26.7812
Length 2, alignments in bytes 0/ 0: 15.7656 24.8906 24.4219 25.4062
Length 2, alignments in bytes 0/ 2: 14.9688 24.7031 25.5469 19.125
Length 2, alignments in bytes 2/ 0: 20.9219 24.7031 25.0156 28.2344
Length 3, alignments in bytes 0/ 0: 17.1406 26.4844 26.1562 30.7969
Length 3, alignments in bytes 0/ 0: 16.7656 25.0312 25.4062 20.6875
Length 3, alignments in bytes 0/ 3: 17.5625 25.2188 25.9688 27.6719
Length 3, alignments in bytes 3/ 0: 16.7656 25.0312 25.4062 31.5469
Length 4, alignments in bytes 0/ 0: 18.4688 24.8906 24.3125 24.9375
Length 4, alignments in bytes 0/ 0: 20.4062 24.5156 24.75 32.8281
Length 4, alignments in bytes 0/ 4: 17.7188 24.5625 24.7812 24.4531
Length 4, alignments in bytes 4/ 0: 21.4375 25.3594 24.3125 38.8125
Length 5, alignments in bytes 0/ 0: 21.2031 25.125 24.7344 27.1094
Length 5, alignments in bytes 0/ 0: 23.0469 24.5625 25.0312 32.5938
Length 5, alignments in bytes 0/ 5: 32.4844 24.8438 24.7969 25.7344
Length 5, alignments in bytes 5/ 0: 22.3906 24.6562 25.2188 30.5938
Length 6, alignments in bytes 0/ 0: 23.8906 25.6406 24.5938 29.9844
Length 6, alignments in bytes 0/ 0: 37.875 25.1562 24.9375 29.375
Length 6, alignments in bytes 0/ 6: 25.0312 24.8438 24.7969 29.7031
Length 6, alignments in bytes 6/ 0: 37.4375 25.2656 25.2656 33.9062
Length 7, alignments in bytes 0/ 0: 25.3125 27.5781 27.2031 38.1562
Length 7, alignments in bytes 0/ 0: 26.25 25.2188 25.4531 28.9844
Length 7, alignments in bytes 0/ 7: 42.7812 25.0312 25.2656 28.75
Length 7, alignments in bytes 7/ 0: 26.6719 25.0781 24.7812 35.5156
Length 8, alignments in bytes 0/ 0: 28.1406 24.8906 25.125 27.0625
Length 8, alignments in bytes 0/ 0: 28.3906 24.75 25.2656 25.4062
Length 8, alignments in bytes 0/ 0: 28.4844 24.7344 24.6562 19.6406
Length 8, alignments in bytes 0/ 0: 26.5 24.8906 24.8438 20.0312
Length 9, alignments in bytes 0/ 0: 51.5156 24.6875 24.8906 33.0625
Length 9, alignments in bytes 0/ 0: 28.9844 24.75 25.2188 20.875
Length 9, alignments in bytes 0/ 1: 29.9844 24.8438 24.9375 20.8281
Length 9, alignments in bytes 1/ 0: 28.9062 25.0781 25.0625 60.8281
Length 10, alignments in bytes 0/ 0: 30.7969 25.4062 24.5469 32.3594
Length 10, alignments in bytes 0/ 0: 31.9688 24.9375 24.5 23.75
Length 10, alignments in bytes 0/ 2: 32.5469 24.9375 25.3125 23.5625
Length 10, alignments in bytes 2/ 0: 31.3438 24.8906 25.3594 61.8125
Length 11, alignments in bytes 0/ 0: 32.5938 24.7969 24.8906 28.2344
Length 11, alignments in bytes 0/ 0: 34.2344 24.9375 25.0781 25.7812
Length 11, alignments in bytes 0/ 3: 34.2344 24.8438 24.5938 25.8281
Length 11, alignments in bytes 3/ 0: 33.4375 24.9844 25.5 61.4844
Length 12, alignments in bytes 0/ 0: 35.8438 24.7031 24.5938 36.7344
Length 12, alignments in bytes 0/ 0: 33.6719 25.2656 24.9844 36.8906
Length 12, alignments in bytes 0/ 4: 34.6562 27.3438 26.625 34.0938
Length 12, alignments in bytes 4/ 0: 34.7969 26.0625 25.9688 44.3438
Length 13, alignments in bytes 0/ 0: 40 24.9844 24.9375 40.1875
Length 13, alignments in bytes 0/ 0: 37.4531 24.9375 25.4531 30.5
Length 13, alignments in bytes 0/ 5: 36.9844 27.4375 26.5312 31.875
Length 13, alignments in bytes 5/ 0: 62 25.5938 25.9688 37.5469
Length 14, alignments in bytes 0/ 0: 40.9531 24.75 25.2188 35.8438
Length 14, alignments in bytes 0/ 0: 38.3438 24.3594 25.0312 33.5156
Length 14, alignments in bytes 0/ 6: 38.8125 27.4375 26.6406 34.6562
Length 14, alignments in bytes 6/ 0: 39.625 25.9688 25.4531 38.5781
Length 15, alignments in bytes 0/ 0: 42.4062 26.4531 26.1094 35.2188
Length 15, alignments in bytes 0/ 0: 39.7031 25.8906 24.7812 33.7656
Length 15, alignments in bytes 0/ 7: 41.1719 28.2344 27.25 33.1406
Length 15, alignments in bytes 7/ 0: 39.1406 25.2656 24.9375 40.3281
Length 16, alignments in bytes 0/ 0: 43.6406 27.5781 28.1094 26.1562
Length 16, alignments in bytes 7/ 2: 42.2188 25.9219 25.3125 39.625
Length 32, alignments in bytes 0/ 0: 113.094 29.7969 27.7656 47.4062
Length 32, alignments in bytes 6/ 4: 111.25 26.3906 26.6875 50.2031
Length 64, alignments in bytes 0/ 0: 168.969 56.0469 42.2656 49.1562
Length 64, alignments in bytes 5/ 6: 262.359 43.7344 44.0156 96.4688
Length 128, alignments in bytes 0/ 0: 539.891 57.9844 65.7344 80.2344
Length 128, alignments in bytes 4/ 0: 325.312 56.2969 50.1875 160.422
Length 256, alignments in bytes 0/ 0: 641.75 73.7031 65.4531 145.016
Length 256, alignments in bytes 3/ 2: 639.953 81.0312 63.7969 268.312
Length 512, alignments in bytes 0/ 0: 1271.36 106.719 94.7656 442.656
Length 512, alignments in bytes 2/ 4: 2048.22 175.562 128.969 301.703
Length 1024, alignments in bytes 0/ 0: 2528.8 169.469 189.734 528.031
Length 1024, alignments in bytes 1/ 6: 4054.08 207.219 234.984 991.812
Length 16, alignments in bytes 1/ 2: 45.6719 29.1875 28.7656 45.0469
Length 16, alignments in bytes 2/ 1: 41.1719 25.9219 25.2188 47.0781
Length 16, alignments in bytes 1/ 1: 42.6875 27.9062 26.6719 46
Length 16, alignments in bytes 1/ 1: 43.3594 26.8281 27.3438 46
Length 32, alignments in bytes 2/ 4: 170.047 30.9219 30.75 74.9531
Length 32, alignments in bytes 4/ 2: 92.3594 27.1406 25.6875 51.6562
Length 32, alignments in bytes 2/ 2: 93.5 28.6719 27.625 51.75
Length 32, alignments in bytes 2/ 2: 160.938 27.9531 27.7188 74.2344
Length 64, alignments in bytes 3/ 6: 181.391 42.8281 54.4062 67.5312
Length 64, alignments in bytes 6/ 3: 169.016 54.9688 43.0625 97.75
Length 64, alignments in bytes 3/ 3: 169.531 41.7812 42.5938 68.5625
Length 64, alignments in bytes 3/ 3: 268.641 53.2656 41.5 69.4688
Length 128, alignments in bytes 4/ 0: 325.5 56.625 60.4375 99.3125
Length 128, alignments in bytes 0/ 4: 346.656 67.8594 59.4062 81.2188
Length 128, alignments in bytes 4/ 4: 325.969 81.2188 49.875 99.8281
Length 128, alignments in bytes 4/ 4: 325.641 78.6719 48.7812 97.8906
Length 256, alignments in bytes 5/ 2: 640.562 113.766 65.125 161.547
Length 256, alignments in bytes 2/ 5: 663.469 90.9531 83.8125 272.234
Length 256, alignments in bytes 5/ 5: 641.562 76.4062 63.9844 161.453
Length 256, alignments in bytes 5/ 5: 640.469 75.375 63.1875 256.703
Length 512, alignments in bytes 6/ 4: 1270.47 118.625 97.6562 290.422
Length 512, alignments in bytes 4/ 6: 1307.3 132.281 129.672 296.125
Length 512, alignments in bytes 6/ 6: 1270.56 142.844 96.3438 472.641
Length 512, alignments in bytes 6/ 6: 1270.17 142.938 96.0469 288.047
Length 1024, alignments in bytes 7/ 6: 2529.41 194.453 154.984 544.938
Length 1024, alignments in bytes 6/ 7: 2715.66 205.328 235.453 584.797
Length 1024, alignments in bytes 7/ 7: 4055.83 170.328 149.453 542.391
Length 1024, alignments in bytes 7/ 7: 2529.73 221.656 150.031 881.031