This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Fixing the distribution problems with TLS and DTV_SURPLUS slots.


On Oct  9, 2014, "Carlos O'Donell" <carlos@redhat.com> wrote:

> On 10/07/2014 02:15 AM, Alexandre Oliva wrote:
>> On Oct  6, 2014, "Carlos O'Donell" <carlos@redhat.com> wrote:
>> 
>>> This code is a *heuristic*, it basically fails the load if there
>>> are no DTV slots left, even though we can still do the following:
>> 
>>> (a) Grow the DTV dynamically as many times as we want, with the
>>> generation counter causing other threads to update.
>> 
>> or
>> 
>> (a)' Stop wasting DTV entries with modules assigned to static TLS.
>> There's no reason whatsoever to do so.
>> 
>> This optimization is even described in the GCC Summit article in
>> which I first proposed TLS Descriptors.  Unfortunately, I never
>> got around to implementing it.

> I was not aware of this, but if possible is a great solution.

It might seem like a solution for the glibc bug that arbitrarily limits
the number of DTV entries for modules assigned to Static TLS, yes.  But
that's just glibc behaving silly.

If we didn't have such a blatant bug, it would be just an optimization.

>>> and
>> 
>>> (b) Allocate from the static TLS image surplus until it is exhausted.
>> 
>> 
>>> - Remove the check above, allowing the code to grow the DTV as large
>>> as it wants for as many STATIC_TLS modules as it wants.
>> 
>> We don't really need to grow the DTV right away.  If we have static TLS,
>> we could just leave the DTV alone.  No code will ever access the
>> corresponding DTV entry.  If any code needs to update the DTV, because
>> of some module assigned to dynamic TLS, then, and only then, should the
>> DTV grow.

> I had not considered this optimization, but I guess it would work.

That's another optimization that helps work around a bug.  I'd rather we
fixed the bug and stop limiting the number of DTV entries for Static
TLS.


>>> WARNING: On AArch64 or any architecture that uses the generic-ish
>>> code for TLS descriptors, you will have further problems. There
>>> the descriptors consume static TLS image greedily, which means
>>> you may find that there is zero static TLS image space when you
>>> go to dlopen an application.

>> That's perfectly valid behavior, that exposes the bug in libraries that
>> are expected to be loadable after a program starts (say, by dlopen) when
>> relocations indicate they had to be brought in by Initial Exec.

> I did not argue that it was invalid behaviour. I only wished to warn
> the reader that the situation at present will result in broken applications.

The DTV arbitrary limit bug, yes, it's a bug that needs fixing right
away.

As for the applications, if they dlopen libs that use IE TLS to access
variables in non-IE modules, they are broken already.  There's nothing
we can do in glibc to unbreak them.  Any limit we set on the static TLS
area can be exceeded if you load enough of these libraries.  Using
static TLS when it's not strictly necessary just makes their bug more
visible.  But it's their bug.  dlopen of IE is oxymoronic.

> We the tools authors allows this situation to get out of hand, and now we
> have both pieces when it breaks, and must do our level best to ensure
> things continue to work while providing a way out of the situation.
 
So, we tell libs that abuse IE to switch to TLS Desc GD on platforms
where TLS Descriptors are implemented.  On other platforms, they might
get lucky with IE, but they're still broken and asking for trouble.

> When you use dlopen with IE you run into the problem that that
> we see today with TLS descriptors.

Make it âproblemsâ.  There's the arbitrary limit on DTV size, that's a
bug in glibc, and there's the unwarranted assumption that the Static TLS
area size is infinite, that's a bug in libs using IE to access dlopened
TLS.

> You have a desire to keep the application working with the existing
> set of ~40 DSOs on the system that use IE,

I don't, really.  They're buggy, and their fix is trivial: switching to
TLS Descriptors.

> and we have a desire to keep TLS descriptors optimal.

Once they fix their bug, that would follow naturally.

> If we keep TLS descriptors optimal, they may consume all static TLS
> image and result in an application crash if a dlopen'd DSO uses
> IE, and I wish to avoid that crash.

Sorry, you can't, unless you come up with a way to make the statlc TLS
area infinite.

Making it configurable would help work around the bug in the libs, but
if the libs can't fallback to dynamic TLS, like TLS Descriptors do, load
enough of them and you'll run into the error.  It's not fixable.

> When I speak about DSOs I speak singularly about those loaded via
> dlopen.
 
Let's call them non-IE modules, dlopened modules, or late-loaded
modules?  LLDSO?  non-IE DSO?

> Yes. We have libraries in the OS using GCC constructs to force IE for
> certain __thread variables. We need to move them away from those uses,
> but we need to ensure a good migration path e.g. same speed, continues
> to work until we migrate all DSOs etc.

I don't understand the bit about migration path.  One can drop the
initial exec buggy annotation and use -mtls-dialect=gnu2 on x86 or
x86_64, and the problem of running out of static TLS space will go away.
I'm not sure this is enough to fix our bug of arbitrarily limiting the
DTV size used by static TLS from non-IE modules.

>> So assume you load a module A defining a TLS section, and conservatively
>> assign it to dynamic TLS, for whatever reason.  Then you load a module B
>> that expects A to be in static TLS, because it uses IE to reference its
>> TLS symbols.  Kaboom.  The âconservativeâ approach just broke what would
>> have worked if you hadn't gratuitously taken it out of TLS.

> I don't think this scenario is supported by the present tools.

Why not?

There's nothing that stops a libA from using the IE access model to
access symbols not defined in itself or any of its dependencies.

And there's nothing that stops another âunrelatedâ libB from defining
those symbols.

Without TLS Descriptors' greedy use static TLS, you have to arrange for
libB to be initially-loaded for it to get static TLS, otherwise dlopen
will fail because the IE model can't be satisfied.  But TLS Descriptors'
greedy use of TLS made this more flexible, and it's been around for
almost a decade.  'cept now, if your patch made it, it's broken.  People
who switched to GNU2 TLS in order for this to work now get a failure.

> The only uses I have ever seen for IE in a DSO is optimal access of
> local thread variables.

I wouldn't be surprised if they exist anyway.  Consider a primary
library that defines the TLS variables, and a separately-loadable plugin
that accesses them.  Even if it were to list the primary library as a
direct dependency, if you load the primary library first, the loader
makes a decision of where to place its TLS segment at that point.  It
doesn't wait to see whether you load a subsequent plugin that demands IE
to access the primary lib's TLS vars.

> If you do think it can happen please start a distinct thread and we talk
> about it and look into the source.

Uhh...  Why use a distinct thread for the same topic?  It's not like
this is a departure to a different topic, it's just proof that the
heuristics proposed to alleviate or fix the problem are broken.  It
might make some cases work, but at the expense of breaking others that
have worked for a long time.

> What if the module author can never tolerate GD-like performance and
> would rather it fail than load and run slowly e.g. MESA/OpenGL?

Then they use IE and make the library an IE dep.  If they don't, and it
fails because static TLS was exhausted, they get the failure they asked
for.

> For example our work on tunnables to allow users to tweak up the size
> of static TLS image surplus is one potential solution to this problem.

It's a workaround, not a solution.  Unbounded static TLS would be a
solution, but that's not possible.

> It might also be possible to try make the static TLS image size a single
> mapping that we might possible be able to grow with kernel help?

We'd still have to reserve a limited amount of unmapped VM next to each
thread's static TLS area.  This would enable some growth without using
more memory pages, but it would still be limited in size, because we
can't move it: we could only grow it into the area reserved for its
growth.

>> So, in addition to stopping wasting DTV entries with static TLS
>> segments, Isuggest not papering over the problem in glibc, but rather
>> proactively convert dlopenable libraries that use IE to access TLS
>> symbols that are not guaranteed to have been loaded along with the IE to
>> use TLS Descriptors in General Dynamic mode.

> I agree that this is the correct solution, but *today* we have problems
> loading user applications.

But why can't the broken libraries be fixed right away?

> Can I convert one variable at a time to be a TLS descriptor?

Moo.  The question doesn't make sense.  The variable is unrelated to the
access model used to access it.  It could even be defined in a separate
module.

You could in theory specify the access model to use on a per-use basis.

Currently, however, because the annotation is placed on the variable
declaration, and there can only be one declaration of each variable per
translation unit, you can only choose the access model to use for a
variable on a per-translation-unit basis.

However, GNU2 is not an access model, it's an alternate set of access
models.  You can't currently specify âI want to use TLS Descriptor-based
Global Dynamic for this variable in this translation unitâ unless you
switch to the GNU2 TLS dialect, and if you do, you don't have to specify
anything.

However, if you use stricter access models such as IE in one translation
unit, the linker will relax less-strict access models in other units to
the stricter one, as it links them into a single SO.

>> In order to ease this sort of transition, I've been thinking of
>> introducing an alias access model in GCC, that would map to GD if TLS
>> Descriptors are enable, or to the failure-prone IE with old-style TLS.
>> Then those who incorrectly use IE today can switch to that; it will be a
>> no-op on arches that don't have TLSDesc, or that don't use it by
>> default, but it will make the fix automatic as each platform switches to
>> the superior TLS design.

> Oh. Right. If upstream can't use TLS descriptors everywhere, then it
> may find itself failing to compile on certain targets that don't support
> descriptors.

> The alias access model in GCC would be something like:
> `__attribute__((tls_model("go-fast")))`?

Yeah.  I meant to ask for suggestions on the spelling, but I forgot.

go-fast is not a good one, though; anyone familiar with TLS access
models would assume it means LE, since that's the fastest access model.
But then, unless both the variable and the access end up in the main
executable, the linker will error out.

Maybe "desc_or_initial"?

>>> In Fedora we disallow greedy consumption of TLS descriptors on any
>>> targets that have TLS descriptors on by default.

>> Oh, wow, this is such a great move that it makes TLS Descriptors's
>> performance the *worst* of all existing access models.  If we want to
>> artificially force them into their worst case, we might as well get rid
>> of them altogether!

> If it doesn't work and causes applications to stop working
> I'll disable it, and I did :-)

Are you really speaking of the same thing?

I mean, there are two different related problems here: exhausting the
DTV, and exhausting the static TLS space.  AFAIK, all you did was work
around the former, by growing the DTV surplus.  Did you ALSO disable
TLSDesc's dynamic relaxation of GD to IE, by preventing greedy use of
the TLS area?  I recently saw patches proposed to that end, months ago,
but I didn't notice any approved patches to that end.

>> Whom should I thank for making my work appear to suck?  :-(

> Me. I didn't do it because I think it was the right solution.
> I did it because users need working applications to do the tasks
> they chose Fedora for.

If you did that, you broke long-existing glibc features so that they
didn't have to fix the bugs in their own libs.  You traded a failure
caused by an application bug for a failure in a bugless application.

It's not the right solution.  It's not even the wrong solution.  It's
not a solution at all.

> It is not sufficient for me to say: "Wait a few months while I
> fix the fundamental flaws in the education of users and the usage
> of our tools." :-}
 
How about âlib authors, use -mtls-dialect=gnu2 and drop the unwarranted
initial_exec tls model selectionâ?

> (a) Immediately increase DTV surplus size.
> (b) Implement static TLS support without needing a DTV increase.
> (c) Remove faulty heursitics around not wanting to increase DTV size.
> (d) Add __attribute__((tls_model("go-fast"))) to gcc that defaults to
>     IE if TLS Desc is not present.
> (e) Approach upstream projects with patches to convert to TLS descriptors
>     using go-fast model.

Heh.  I guess my suggestion is that we go backwards in your list.  IE
abuse is not our bug.

(b) requires little more than dropping some incorrect asserts.  And once
we get to that (remember, going backwards), (a) is completely
unnecessary: why waste per-thread memory for everyone if it's not
needed?

-- 
Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]