This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH RFC] Improve 64bit memset for Corei7 with avx2 instruction


We never find prefetcht1 is good instruction to pre-fetch data on
core2, nehalem, sandybridge, and haswell. Our experiments  show
prefetchw is best in your cases.
In your code, memset only handle 256 bytes, in this case we don't need
to use prefetch because hardware prefetch is enough for us in small
size, but it can tell us whether prefetch will hurt performance so we
run it, result is below, it indicates prefetchw on haswell is
harmless, even it is redundant code in memset on haswell.

[root@localhost memset_cache]# ./test
size: 32000
0.10    0.10
0.10    0.10
0.10    0.10
0.10    0.10
0.10    0.10
0.10    0.11
0.11    0.11
0.10    0.10
0.10    0.11
0.10    0.10
size: 256000
0.21    0.22
0.22    0.21
0.21    0.21
0.21    0.21
0.22    0.21
0.21    0.21
0.21    0.21
0.22    0.22
0.21    0.21
0.22    0.20
size: 1024000
0.38    0.38
0.38    0.38
0.38    0.38
0.38    0.38
0.38    0.38
0.38    0.38
0.38    0.38
0.38    0.38
0.38    0.38
0.38    0.38
size: 204800
0.20    0.21
0.20    0.20
0.19    0.19
0.20    0.20
0.20    0.19
0.19    0.19
0.19    0.19
0.19    0.20
0.20    0.20
0.20    0.21
size: 4048000
0.44    0.44
0.44    0.44
0.44    0.44
0.44    0.44
0.44    0.44
0.44    0.44
0.44    0.44
0.44    0.44
0.44    0.44
0.44    0.44
size: 8096000
0.44    0.44
0.44    0.44
0.44    0.44
0.44    0.44
0.44    0.44
0.45    0.44
0.44    0.44
0.44    0.44
0.44    0.44
0.44    0.44


Then we modified memset2 to handle 4096 bytes
in test.c as bellow
...
char ary[SIZE+4096];
...
memset2(ary+(512*((unsigned)rand_r(&seed)))%SIZE,0,4096);
and run your code on haswell as below, result shows prefetchw get
better  performance
and harmless.

[root@localhost memset_cache]# ./test
size: 32000
1.01    0.91
0.98    0.90
0.98    0.91
0.98    0.91
0.98    0.91
0.97    0.91
0.98    0.91
0.97    0.91
1.00    0.91
0.97    0.91
size: 256000
1.34    1.36
1.34    1.33
1.35    1.35
1.37    1.35
1.35    1.34
1.36    1.34
1.36    1.34
1.37    1.36
1.38    1.35
1.36    1.35
size: 1024000
1.81    1.81
1.81    1.81
1.82    1.81
1.81    1.81
1.81    1.81
1.82    1.81
1.81    1.81
1.81    1.81
1.81    1.81
1.82    1.81
size: 204800
1.29    1.27
1.30    1.30
1.32    1.33
1.34    1.31
1.31    1.27
1.30    1.31
1.35    1.32
1.32    1.33
1.36    1.33
1.34    1.31
size: 4048000
1.95    1.94
1.95    1.95
1.95    1.95
1.95    1.94
1.95    1.95
1.94    1.95
1.95    1.95
1.95    1.95
1.95    1.94
1.95    1.95
size: 8096000
2.14    2.14
2.15    2.16
2.15    2.15
2.15    2.16
2.16    2.17
2.17    2.17
2.17    2.18
2.16    2.19
2.16    2.17
2.18    2.17

We will  test prefetchw in our code with gcc.403,  according to data
we will do corresponding behavior in next version.

Thanks
Ling


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]