[PATCH RFC] Imporve 64bit memset performance for Haswell CPU with AVX2 instruction

Ling Ma ling.ma.program@gmail.com
Tue Jun 10 13:52:00 GMT 2014


In this patch as gziped attachment, we take advantage of HSW memory
bandwidth, manage to reduce miss branch prediction by avoiding using
branch instructions and
force destination to be aligned with avx & avx2 instruction.

The CPU2006 403.gcc benchmark indicates this patch improves performance
from 26% to 59%.

This version accept Ondra's comments and avoid branch instruction to
cross 16byte-aligned code.

Thanks
Ling
-------------- next part --------------
A non-text attachment was scrubbed...
Name: memset-avx2.patch.tar.gz
Type: application/x-gzip
Size: 2895 bytes
Desc: not available
URL: <http://sourceware.org/pipermail/libc-alpha/attachments/20140610/bd622ba7/attachment.bin>


More information about the Libc-alpha mailing list