This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Fixing the distribution problems with TLS and DTV_SURPLUS slots.


On 10/07/2014 02:15 AM, Alexandre Oliva wrote:
> On Oct  6, 2014, "Carlos O'Donell" <carlos@redhat.com> wrote:
> 
>> This code is a *heuristic*, it basically fails the load if there
>> are no DTV slots left, even though we can still do the following:
> 
>> (a) Grow the DTV dynamically as many times as we want, with the
>>     generation counter causing other threads to update.
> 
> or
> 
>   (a)' Stop wasting DTV entries with modules assigned to static TLS.
>        There's no reason whatsoever to do so.
> 
>        This optimization is even described in the GCC Summit article in
>        which I first proposed TLS Descriptors.  Unfortunately, I never
>        got around to implementing it.

I was not aware of this, but if possible is a great solution.

>> and
> 
>> (b) Allocate from the static TLS image surplus until it is exhausted.
> 
> 
>> - Remove the check above, allowing the code to grow the DTV as large
>>   as it wants for as many STATIC_TLS modules as it wants.
> 
> We don't really need to grow the DTV right away.  If we have static TLS,
> we could just leave the DTV alone.  No code will ever access the
> corresponding DTV entry.  If any code needs to update the DTV, because
> of some module assigned to dynamic TLS, then, and only then, should the
> DTV grow.

I had not considered this optimization, but I guess it would work.

>> WARNING: On AArch64 or any architecture that uses the generic-ish
>> code for TLS descriptors, you will have further problems. There
>> the descriptors consume static TLS image greedily, which means
>> you may find that there is zero static TLS image space when you
>> go to dlopen an application.
> 
> That's perfectly valid behavior, that exposes the bug in libraries that
> are expected to be loadable after a program starts (say, by dlopen) when
> relocations indicate they had to be brought in by Initial Exec.

I did not argue that it was invalid behaviour. I only wished to warn
the reader that the situation at present will result in broken applications.
We the tools authors allows this situation to get out of hand, and now we
have both pieces when it breaks, and must do our level best to ensure
things continue to work while providing a way out of the situation.
 
> That they worked was not by design; it was pretty much by accident,
> because glibc led by (bad) example instead of coming up with a real
> solution, and others followed suit, breaking glibc's own assumption that
> only a very small amount of static TLS space would ever be used after
> theprogram started, and that the consumer of that space would be glibc
> itself.

I agree.

>> We need to further subdivide the static TLS image space into "reserved
>> for general use" and "reserved for DSO load uses."  With the TLS
>> descriptors allocating from the general use space only.
> 
> ?!?
> 
> Static TLS space grows as much as needed to fit all IE DSOs.  Some
> excess is reserved (and this should be configurable), but if we don't
> use it for modules that could benefit from it, what should we use it
> for?

My apologies let me clarify. The static TLS space that is allocated
is only for DSOs that are known apriori to the static linker. They
must have been specified on the command line. Unfortunately in programs
written in interpreted languages like python, everything is a dlopen'd
DSO. When you use dlopen with IE you run into the problem that that
we see today with TLS descriptors. You have a desire to keep the
application working with the existing set of ~40 DSOs on the system
that use IE, and we have a desire to keep TLS descriptors optimal.
If we keep TLS descriptors optimal, they may consume all static TLS
image and result in an application crash if a dlopen'd DSO uses
IE, and I wish to avoid that crash.

>> On Fedora for AArch64 this
>> caused no end of headaches attempting to load TLS IE using DSOs
>> only to find it was literally impossible because so much of the
>> implementation used TLS descriptors that the surplus static TLS
>> image space was gone, and while descriptors can be allocated 
>> dynamically, the DSOs can't.
> 
> Err...  I get a feeling I have no idea of what you refer to as DSO.
> From the description, it's not Dynamically-loaded Shared Object.  What
> is it, then?

My apologies again. Given that known DSOs using IE at link time will
have static TLS image space allocated I have stopped talking about
those since we know they work correctly. When I speak about DSOs I
speak singularly about those loaded via dlopen.
 
> I suppose you may be speaking of modules that assume IE is usable to
> access TLS of some module (itself, or any other), even though the
> assumption is no warranted.

Yes. We have libraries in the OS using GCC constructs to force IE for
certain __thread variables. We need to move them away from those uses,
but we need to ensure a good migration path e.g. same speed, continues
to work until we migrate all DSOs etc.

> So assume you load a module A defining a TLS section, and conservatively
> assign it to dynamic TLS, for whatever reason.  Then you load a module B
> that expects A to be in static TLS, because it uses IE to reference its
> TLS symbols.  Kaboom.  The âconservativeâ approach just broke what would
> have worked if you hadn't gratuitously taken it out of TLS.

I don't think this scenario is supported by the present tools.

The only uses I have ever seen for IE in a DSO is optimal access of local
thread variables.

If the static linker could see B accesses A's TLS using IE (requires B to
be listed as a dependency or in the link list) then both A and B
would have to use static TLS, and that forces both into the static TLS
image. It would then be wrong for the dyn loader to load A as dynamic TLS.

If you do think it can happen please start a distinct thread and we talk
about it and look into the source.

> Now, of course when you load A you don't know whether module B is going
> to be loaded, and whether it will require A to use static TLS or not, or
> whether module C would fail to load afterwards because there's not
> enough static TLS space for its own TLS section, and it uses IE even
> though it's NOT being loaded as a dependency of the IE.
> 
> So not saving static TLS space for later use may expose breakage in
> subsequently loaded modules, whereas saving it may equally expose
> breakage in subsequently loaded modules, but waste static TLS space and
> *significantly* impact performance of TLS Descriptor-using modules that
> could have got IE-like performance.  That sounds like a losing strategy
> to me.

The only valid sequences I know of are:

(a) Module uses static TLS and is known by the static linker and has
    static TLS image space allocated.

(b) Module uses static TLS and is not known to the static linker, accesses
    only it's own variables with IE, and has no static TLS images space
    reserved for it.

The optimizing use of static TLS by thread descriptors breaks (b).

> Greedy allocation doesn't guarantee optimal results, but it won't break
> anything that isn't already broken, and if and when such breakage is
> exposed, switching the broken modules to TLS Descriptors will get them
> nearly identical performance for TLS references that happen to land in
> static TLS, but that will NOT cause the library to fail to load
> otherwise: it will just get GD-like performance.

What if the module author can never tolerate GD-like performance and
would rather it fail than load and run slowly e.g. MESA/OpenGL?

Remember, and keep in mind our users, we do this for them, and some
of them have strict performance requirements. We should not lightly
tell them what they want is wrong.

For example our work on tunnables to allow users to tweak up the size
of static TLS image surplus is one potential solution to this problem.

It might also be possible to try make the static TLS image size a single
mapping that we might possible be able to grow with kernel help?

> So, in addition to stopping wasting DTV entries with static TLS
> segments, Isuggest not papering over the problem in glibc, but rather
> proactively convert dlopenable libraries that use IE to access TLS
> symbols that are not guaranteed to have been loaded along with the IE to
> use TLS Descriptors in General Dynamic mode.

I agree that this is the correct solution, but *today* we have problems
loading user applications. I see no options but to follow a staggered
strategy:

(a) Immediately increase DTV surplus size.

	- Distribution patches are doing this already to keep applications working.

(b) Implement static TLS support without needing a DTV increase.

	- Reduces memory usage of DTV. Small optimization.

    and

    Remove faulty heursitics around not wanting to increase DTV size.

(c) Approach upstream projects with patches to convert to TLS descriptors.

When we do (c), can it be done on a per-variable basis?

Can I convert one variable at a time to be a TLS descriptor?

As is done currently with the gcc attributes for TLS mode?

> In order to ease this sort of transition, I've been thinking of
> introducing an alias access model in GCC, that would map to GD if TLS
> Descriptors are enable, or to the failure-prone IE with old-style TLS.
> Then those who incorrectly use IE today can switch to that; it will be a
> no-op on arches that don't have TLSDesc, or that don't use it by
> default, but it will make the fix automatic as each platform switches to
> the superior TLS design.

Oh. Right. If upstream can't use TLS descriptors everywhere, then it
may find itself failing to compile on certain targets that don't support
descriptors.

The alias access model in GCC would be something like:
`__attribute__((tls_model("go-fast")))`?

>> In Fedora we disallow greedy consumption of TLS descriptors on any
>> targets that have TLS descriptors on by default.
> 
> Oh, wow, this is such a great move that it makes TLS Descriptors's
> performance the *worst* of all existing access models.  If we want to
> artificially force them into their worst case, we might as well get rid
> of them altogether!

If it doesn't work and causes applications to stop working
I'll disable it, and I did :-)

> Whom should I thank for making my work appear to suck?  :-(

Me. I didn't do it because I think it was the right solution.
I did it because users need working applications to do the tasks
they chose Fedora for.

It is not sufficient for me to say: "Wait a few months while I
fix the fundamental flaws in the education of users and the usage
of our tools." :-}
 
> :-P :-)
> 
>> We need to turn on TLS descriptors by default on x86_64 such
>> that we can get the benefits there, and start moving DSOs away
>> from TLS IE.
> 
> Hallelujah! :-)

You know I know what the right answer is, but we have to get there
one step at a time with working applications the whole way.

In summary looks like we need:

(a) Immediately increase DTV surplus size.
(b) Implement static TLS support without needing a DTV increase.
(c) Remove faulty heursitics around not wanting to increase DTV size.
(d) Add __attribute__((tls_model("go-fast"))) to gcc that defaults to
    IE if TLS Desc is not present.
(e) Approach upstream projects with patches to convert to TLS descriptors
    using go-fast model.

Does this plan make sense?

Cheers,
Carlos.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]