This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][BZ #19329] Fix race between tls allocation at thread creation and dlopen



On 18-01-2016 18:09, Szabolcs Nagy wrote:
> On 18/01/16 18:00, Adhemerval Zanella wrote:
>> On 14-01-2016 13:09, Szabolcs Nagy wrote:
>>> On 13/01/16 02:34, Carlos O'Donell wrote:
>>>> On 01/11/2016 11:32 AM, Szabolcs Nagy wrote:
>>>>> 2016-01-11  Szabolcs Nagy  <szabolcs.nagy@arm.com>
>>>>>
>>>>>       [BZ #19329]
>>>>>       * elf/dl-open.c (dl_open_worker): Write GL(dl_tls_generation)
>>>>>       atomically.
>>>>>       * elf/dl-tls.c (_dl_allocate_tls_init): Read GL(dl_tls_generation),
>>>>>       GL(dl_tls_max_dtv_idx), slotinfo entries and listp->next atomically.
>>>>>       (_dl_add_to_slotinfo): Write the slotinfo entries and listp->next
>>>>>       atomically.
>>>>
>>>> You are headed in the right direction. I like where this patch is going,
>>>> but don't have enough time to review this in detail yet.
>>>>
>>>
>>> will there be time before 2.23?
>>
>> I intend to send a message about the hard freeze today and the question is
>> if this fix is suitable to not show regression in the following mount and
>> if so if we could iron out the bugs.  So are you confident about that?
>>
>> I am asking you this because this synchronization issues are a nest of rats
>> sometimes and I also noted you found out multiple issues during the patch
>> sending and reviewing. Objections?
>>
> 
> i'm confident that i can fix the races i can
> trigger (new threads vs dlopen), without creating
> regressions (other than performance regression).
> 
> adding a handful of atomics is enough for that
> (it should not alter the behaviour when there is
> no conflicting memory access), however it might
> not be the best solution (too conservative) or i
> might miss a few races etc.
> 

Fair enough, I have added it on the blockers for 2.23 release.

>>>
>>>> At first glance your patch lacks sufficient concurrency documentation to
>>>> be acceptable. You need to document which acquires the releases
>>>> synchronizes-with. Please grep for "CONCURRENCY NOTES" to see the level
>>>> of detail we need to maintain these kinds of changes.
>>>>
>>>
>>> i wanted to avoid documenting all the mess in the dynamic linker,
>>> but i can improve the comments.
>>>
>>> i see the following bugs:
>>>
>>> 1) pthread_create is not synced with dlopen
>>> 2) pthread_create is not synced with dlclose
>>> 3) tls access is not synced with dlopen
>>> 4) tls access is not synced with dlclose
>>> 5) tls access can oom crash
>>> 6) tls access is not as-safe
>>> 7) dlopen holds a global lock during ctors
>>>
>>> i can fix 1) by adding some atomics (this is the current patch).
>>>
>>> for 2) that's not enough, because dlclose has to wait with the
>>> free(link_map) while pthread_create is initializing the tls.
>>
>> So which would be a better approach regarding it?
>>
> 
> i don't know what's the best approach
> 
> one possibility is to not free memory in dlclose
> (i guess glibc does not want this).
> 
> another is to sync e.g. by using some lock, but it
> cannot be GL(dl_load_lock) now because that is held
> during ctors are running in dlopen which may call
> pthread_create.
> 
> of course using both atomics and locks is a bit
> ugly..

Right, I think we will need to get back on this after 2.23 release.

> 
>>>
>>> i guess 3) can be fixed similarly to 1) but i don't have a
>>> test case for that.
>>>
>>> the rest is harder to fix.
>>>
>>> is it ok to only fix 1) for 2.23?
>>> or should i try to fix 3) as well (races on the same globals)?
>>>
>>
>> I would say we attack one problem at time, even when they are related.
>> How hard do you consider to fix 3.?
>>
> 
> i haven't looked at that in detail, it involves more
> code starting from __tls_get_addr, but it is probably
> less critical because the code tries to only look at
> slotinfo entries up to a point (module's gen number where
> the accessed tls is) that is guaranteed to be synced by
> a dlopen that returned.
> (which means less problems with concurrently loading
> modules, however GL(*) should be accessed with atomics).
> 
> i can try to cook up something tomorrow with a detailed
> concurrency note and then others can decide if it's
> too risky change or not.
> 

Ok, let's delay the fix for 3. to 2.24 then.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]