This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC][BZ #19329] high level description of pthread_create vs dlopen races


On 07/12/16 14:07, Torvald Riegel wrote:
> On Tue, 2016-12-06 at 17:22 +0000, Szabolcs Nagy wrote:
>> On 05/12/16 19:24, Torvald Riegel wrote:
>>> On Mon, 2016-11-28 at 19:03 +0000, Szabolcs Nagy wrote:
>>>> (5) GL(dl_tls_generation) is not checked for overflow consistently and
>>>> slotinfo entries can get assigned an overflowed generation count, its type
>>>> is size_t and both dlopen and dlclose increment it, so overflow may be a
>>>> concern on 32bit targets.
>>>
>>> Can the size be extended on 32b targets?  If so, we can build a
>>> monotonically increasing, larger atomic counter on 32b targets too.
>>
>> the generation count up to which a dtv is set up in a
>> thread is stored in dtv[0] which is 32bit on a 32bit
>> system.
> 
> The other 32b half could be somewhere else.  It's more efficient if it's
> in the same cacheline of course, but it's not required to be adjacent.

i was wrong about this: a dtv entry has 2 pointers already
so it is 64bit on a 32bit target so a 64bit counter fits.

>>> Before focusing on taking the lock, I'd suggest to look at the bigger
>>> picture (ie, the abstract operations and atomicity and ordering
>>> requirements).
>>> It can be possible that a clean redesign is actually simpler than trying
>>> to fix a system that is already complex and which we don't understand
>>> 100%.
>>
>> i think that's optimistic but i can write up what i think
>> the requirements are.
> 
> Thanks.  Better documentation of what the requirements are would help
> even if we a rewrite is not what we decide to do.

i tried to collect the requirements, the current implementation
is far from it: as-safety of tls access and ctor/dtor without
locks are difficult to fix. if as-safe tls access is not required
then the concurrency fixes are much simpler.

simplified model:

m: module.
m.i: modid.
t: thread.
t.dtv: dtv pointer.
t.dtv[i]: dtv entry i (tls for t,m where m.i==i).

constraints:

tls of a currently loaded module m in thread t is at t.dtv[m.i].

m.i is unique during the lifetime of m: it is assigned before
ctors of m are entered and may be reused after dtors of m return.

tls access to m in thread t is only valid during the lifetime
of t and m (after ctors of m start and before dtors of m end).

during the lifetime of a thread its dtv may be updated:
t.dtv may need to be resized (if an m is loaded with larger m.i).
t.dtv[i] may need to be freed (if an m is dlclosed).
t.dtv[i] may need to be allocated (if an m is dlopened).

if dtv updates are not immediate for all threads at dlopen and
dlclose time, then the threads need a way to tell if t.dtv[i]
is out-of-date in case modid i is reused by the time the dtv
update happens. (this can be done by counting dlopen/dlclose
calls and remembering the counter of the last dtv update and
globally tracking the last counter for each modid i, if
global_counter[i] > t.dtv_counter then t.dtv[i] is out-of-date.
such counter should be 64bit).

dtv update consists of three kind of operations:
1) allocate dtv and dtv entries (malloc)
2) unset out-of-date entries (free)
3) resize dtv, set dtv entries (memcpy and ptr update)

1) alloc:
pthread_create and dlopen needs to be synchronized such that
either sync in dlopen or sync in pthread_create happens before
the other (for all dlopen/pthrea_create pairs), the one that
happens later should do the allocation.
(this is needed because right after dlopen and pthread_create
tls access is valid, but must be as-safe so it cannot easily
allocate).

2) free:
t.dtv[m.i] should be freed eventually after dlclose of m or
after t exits. this is difficult because t.dtv[m.i] need to
be updated if m.i is reused and the tls of the new module is
accessed, but tls access cannot do the free (not as-safe).
so the options are
- dlclose of m frees t.dtv[m.i] for all t (non-trivial).
- allocated dtv entry pointers are remembered somewhere
  and garbage collected in some way (such that overwriting
  t.dtv[i] does not leak memory).

3) update dtv pointer and dtv entry:
either dlopen stops all threads and takes care of dtv updates
or it has to be done during tls access lazily in which case
signals may need to be blocked.

an m with static tls is a special case:
1) if m is already loaded when t is created, then pthread_create
needs to do setups (copy tls init image) that requires accessing
m, so either pthread_create needs to sync with dlclose or it
is invalid to unload an m with static tls, in the later case
pthread_create should be able to walk the currently loaded
modules and tell if they have static tls without accessing the
module structures that are freed during dlclose.
2) if m is loaded after t is created, then dlopen should do the
setup for the current thread, but i think it has to do the setup
for other threads as well (?). (in principle static tls cannot
be dynamically loaded/unloaded but i'm not sure what are the
requirements if glibc internal libraries are dlopened.)

indirectly related to tls:

ctor,dtor handling should be callback safe: dlopen and dlclose
must not hold internal locks otherwise correct ctor/dtor code may
deadlock.

dlopen and pthread_create should roll back state changes on allocation
failure and report the errors (some state changes may be harmless
this needs further analysis).


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]