This is the mail archive of the
mailing list for the pthreas-win32 project.
Re: Re: pthreads-win32 2.8.0, stack alignment, and SSE code
Sébastien Kunz-Jacques wrote:
Ramiro Polla a écrit :
Sébastien Kunz-Jacques wrote:
Not at all ; I use a modified binutils that enforces 16-byte alignment
alignment of .bss sections (basically I reverted the part of the
binutils patch that is linked to in the gcc bug 37216 thread). The bug I
experienced shows up only in threaded code and comes from the fact that
a stack of a win32 thread in only 4-byte aligned. The crash occurs when
a data is read on the stack and not in a .bss segment.
I encountered problems with SSE code compiled with recent mingw GCC
(4.3.2, TDM release, http://www.tdragon.net/recentgcc/) and using
pthreads 2.8.0. After inverstigation, crashes occured because the
code was trying to read operands on the stack, assuming the stack was
16-byte aligned as is the case in the main thread (the main function
aligns the stack and alignment is maintained during each function
call). I solved the issue with a very simple patch that uses some
GCC wizardry to force stack realignment upon entry in a new thread:
--- ptw32_threadStart.c Sun May 15 17:28:27 2005
+++ ptw32_threadStart.c Mon Sep 29 21:28:16 2008
@@ -116,6 +116,9 @@
+#if defined(__GNUC__) && (__GNUC__ > 4 || __GNUC__ == 4 &&
#if ! defined (__MINGW32__) || (defined (__MSVCRT__) && ! defined
The attribute force_align_arg_pointer should be added to every
function that is called with a stack with insufficient alignment; as
far as I am concerned doing this for threadStart only solved my
problems. Maybe this small patch could be added to the pthread code?
I think that's related to:
So I think this patch shouldn't be applied to pthreads-win32, and
people should rather use another version of MinGW (or unset automatic
SSE code, if that makes any sense).
To give some contextual information, I tried to build a math library,
ATLAS, with mingw. First, for the non-threaded version, I encountered
bug 37216 that you mention ; to get rid of this I patched binutils
(adding -fno-common to gcc works also). But the threaded version was
still crashing, and indeed the symptoms looked much similiar to what
occurred in the non-threaded case. Then I found the solution evoked in
my first post.
Please note that adding the attribute force_align_arg_pointer to
threadStart has a negligible performance penalty (a few machine
instructions each time this function is entered/exited)
Hmmm... I understand what's going on now. We had this on FFmpeg some
time last year.
IIRC it all went down to something like:
- Win32 ABI only specifies 4-byte alignment.
- x86 ABI only specifies 4-byte alignment.
=> The thread code is correct when it only aligns to 4-byte.
- gcc aligns main() to 16-byte and maintains this alignment throughout
- gcc doesn't take into account that it is valid to start a thread with
only 4-byte alignment.
- SSE expects 16-byte alignment.
- gcc thinks that a function that needs SSE is already aligned to
16-byte (because of main()), but in fact it might be only 4-byte aligned
(and still be valid for Win32 and x86).
=> It is the function that uses SSE that should make sure it is aligned.
Actually it is enough to make only the thread entry functions aligned
(any function in the external API that at some point might use SSE).
Imagine if someone wants to use that ATLAS library but instead of
starting a new thread it wants to call directly the function that needs
SSE (no I haven't checked if it is possible in this case but it could
happen theoretically). And imagine that someone is using MSVC++ to call
that function. MSVC++ only aligns to 4-byte (and again it is valid).
That function would also crash, independent of your patch.
So in your specific case I think it is the ATLAS functions that should
be aligned (= it would also help to use the library with other compilers).
Your patch can also be seen as a way to always sufficiently align the
stack so that any thread started by pthreads-win32 is ok for SSE
instructions (the same way glibc does I think). In that case I don't
have a strong opinion about it. The overhead really is negligible.
Starting the thread takes much longer.