This is the mail archive of the
libc-help@sourceware.org
mailing list for the glibc project.
Re: Fixing a scalability issue in OpenSSL error reporting
- From: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>
- To: libc-help at sourceware dot org
- Date: Tue, 16 Jun 2015 16:44:12 -0300
- Subject: Re: Fixing a scalability issue in OpenSSL error reporting
- Authentication-results: sourceware.org; auth=none
- References: <55802316 dot 3020809 at redhat dot com>
Hi
On 16-06-2015 10:22, Florian Weimer wrote:
> OpenSSL has its own implementation of thread-local variables (using a
> few global locks and a hash table indexed by the address of errno), and
> the error state is needed in a few places even if no error occurs which
> is visible at the application level. This turns out to be a major
> scaling issue if you have more than a few hundred OpenSSL connections in
> one process (I don't know if it applies to servers as well).
>
> Unlike errno, the error state is not of fixed size, so ideally, a
> deallocation function would run, releasing the state if the error
> information isn't collected before the thread terminates.
>
> The OpenSSL implementation has a function to deallocate the error state
> of *another* thread, but that's obviously racy and would have to be
> turned into a NOP.
>
> I have come up with several potential approaches:
>
> (a) Use __thread (or C++11 thread_local with a POD) and do not add a
> deallocation function. The downside is a potential memory leak, as
> mentioned above. The existing code already has this problem, though.
> Advantage is good portability to older GNU toolchain versions.
>
> (b) Use pthread_setspecific and related functions. This should offer
> even better portability, and there is a destructor function which can
> deallocate memory. The downside is that it currently requires linking
> against libpthread, which is something I want OpenSSL cease to do. A
> fully portable solution with pthread_once may lack performance, and
> portable atomics more or less require a C++11 compiler outside of the
> GNU ecosystem.
>
> (c) Use C++11 thread_local. This requires linking against libstdc++. I
> don't know if this could have adverse consequences, comparable to
> linking against libpthread. Portability will increase over the time,
> something that seems unlikely for (a) and (b).
>
>
> Solutions involving C++11 might be a difficult sell for OpenSSL
> upstream, but I prefer it over reimplementing TLS destructors from
> scratch. The old OpenSSL TLS implementation would still stay around, so
> perhaps it's acceptable to compile just one file with a C++11 compiler.
> That's why I'm leaning towards (c), but I'm not sure about the impact
> of the libstdc++ dependency.
>
>
> I also noticed that pther_setspecific destructors do not run for the
> main thread. Is this a bug?
>
I would go for a solution to use a static limited buffer for each thread,
which unfortunately would require some work in OpenSSL internals. I would
avoid link or build using C++11 just because I do not see why adding C++11
linkage to a C-only project (besides the memory burden on lining against
C++ libraries).
>From what I could get on default 64-bits architectures 'ERR_STATE' size
is alone 600 bytes plus the dynamic plus the dynamic 'err_data' and
'err_file' fields.
The 'ERR_set_error_data' sets as default for 'err_data' a string of
size 80, but it might resize. The 'ERR_put_error' also uses a dynamically
field for 'err_file'.
So if you limit the 'err_data' fields for about 128 bytes and 'err_file'
for '256' you would have about a 600 + 16*(128) + 16*(256) -> 6744 bytes
per thread. It is somewhat high for some embedded environments and we can
add some tunables in configure to enable/disable this or tune the sizes.
It might give the errors codes some less flexibility, but it can be tuned
to balance memory usage and fields usage.