This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCHv3] Protect _dl_profile_fixup data-dependency order [BZ #23690]


On 10/15/18 8:57 AM, Florian Weimer wrote:
> * Carlos O'Donell:
> 
>> (3) Fence-to-fence sync.
>>
>> For fence-to-fence synchronization to work we need an acquire and release
>> fence, and we have that.
>>
>> We are missing the atomic read and write of the guard. Please review below.
>> Florian mentioned this in his review. He is correct.
>>
>> And all the problems are back again because you can't do atomic loads of
>> the large guards because they are actually the function descriptor structures.
>> However, this is just laziness, we used the addr because it was convenient.
>> It is no longer convenient. Just add a 'init' field to reloc_result and use
>> that as the guard to synchronize the threads against for initialization of
>> the results. This should solve the reloc_result problem (ignorning the issues
>> hppa and ia64 have with the fdesc updates across multiple threads in _dl_fixup).
> 
> I think due to various external factors, we should go with the
> fence-based solution for now, and change it later to something which
> uses an acquire/release on the code address later, using proper atomics.

Let me clarify.

The fence fix as proposed in v3 is wrong for all architectures.

We are emulating C/C++ 11 atomics within glibc, and a fence-to-fence sync
*requires* an atomic load / store of the guard, you can't use a non-atomic
access. The point of the atomic load/store is to ensure you don't have a
data race.

> I don't want to see this bug fix blocked by ia64 and hppa.  The proper
> fix needs some reshuffling of the macros here, or maybe use an unused
> bit in the flags field as an indicator for initialization.

The fix for this is straight forward.

Add a new initializer field to the reloc_result, it's an internal data
structure. It can be as big as we want and we can optimize it later.

You don't need to do any big cleanups, but we *do* have to get the
synchronization correct.

>> (4) Review of elf_machine_fixup_plt, and DL_FIXUP_MAKE_VALUE.	
>>
>> I reviewed the uses of elf_machine_fixup_plt, and DL_FIXUP_MAKE_VALUE to
>> see if there was any other case of this problem, particularly where there
>> might be a case where a write happens on one thread that might not be
>> seen in another.
>>
>> I also looked at _dl_relocate_object and the initialization of all 
>> l_reloc_result via calloc, and that is also covered because the
>> atomic_thread_fence_acquire ensures any secondary thread sees the
>> initialization.
> 
> I don't think the analysis is correct.  It's up to the application to
> ensure that the dlopen (or at least the call to an ELF constructor in
> the new DSO) happens before a call to any function in the DSO, and this
> is why there is no need to synchronize the calloc with the profiling
> code.

I agree, you would need some inter-thread synchronization to ensure all
other threads new the dlopen was complete, and that would ensure that
the writes would be seen.

-- 
Cheers,
Carlos.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]