This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
On Thu, Nov 28, 2013 at 11:03:36PM -0200, Alexandre Oliva wrote: > On Nov 19, 2013, Paul Pluzhnikov <ppluzhnikov@google.com> wrote: > > > On Wed, Nov 13, 2013 at 7:45 AM, Paul Pluzhnikov <ppluzhnikov@google.com> wrote: > >> Ping? > >> https://sourceware.org/ml/libc-alpha/2013-11/msg00183.html > > > Ping? Ping? > > Apologies for the delay in this review. > > I'm a bit disappointed at how wasteful the allocation strategy is, > wasting a full page per thread even for a one-word TLS dynamic block. I > think using a TLS-only malloc arena, only locked with signals disabled, > would bring us much better memory efficiency, although there might be a > significant penalty for assigning dynamic segments of different threads > to the same page, should they happen to be accessed very often. This > page sharing by different threads is something we don't necessarily > avoid with the current AS-Unsafe allocation scheme, and avoiding that is > an improvement that the current strategy brings, but I wonder if we > could find some trade off that wouldn't waste so much memory but that > would keep segments used by different threads in separate pages... This > first comment is not meant as a blocker against the page, but as a > suggestion for future improvement. > A proper solution would be make generic malloc async-signal safe. It is question of what is acceptable overhead. I can make a wrapper over any allocator that does that at cost of two tls access. You would pay a mmap overhead only when you are in signal where we do not care about effectivity. As I plan to use a thread-local cache for small allocations which eliminates most of this cost when one has cheap tls. I attached a signal-safe part that I have now, it is part of alternative allocator I now work. As for freeing a malloc already uses mmap for allocation bigger than mmap_treshold so it is possible with filling appropriate header use normal free. > > I was under the impression that one of the main problems with making TLS > AS-Safe was the fact that in some cases of TLS dynamic address > resolution we'd take the dl lock, and if we take a signal while we hold > the lock, and the signal handler attempts to resolve the same address > and take the lock, we'd deadlock if the lock was non-recursive, or risk > finding inconsistent state otherwise. The lock is recursive, but I > don't see anything in the patch besides the CAS loops that avoids > inconsistent state, particularly in the case of lazy TLS relocation > (introduced with GNU2 TLS). Was that not taken into account? > In theory, in practice these are much less probable than malloc problem as these happen in initialization and it takes time signal handler can access object that will cause tls access. A lazy relocation is one of dubious features, did somebody tried to measure if these actually save anything? I would be interested in ratio of resolved / symbols left unresolved. Copying mostly unused tls entries could be a problem but so can be repeated locks and allocations when these entries are actually used. > > It seems to me that, in the absence of locking, > _dl_try_allocate_static_tls is racy, and in the presence of locking, the > CAS loop would be unnecessary. The race is because offset is computed > from GL(dl_tls_static_used) once, and if l->map_tls_offset could change > between that and its update, then so could GL(dl_tls_static_used), > requiring offset to be recomputed based on the modified value. But > GL(dl_tls_static_used) would have to be updated atomically as well, > otherwise concurrent runs for different maps could end up at the same > offset. The code doesn't take such cares, so I wonder if there's > evidence we don't need to worry about that and, if so, why do we still > need the CAS loop. Even if we do hold a lock to prevent other threads > from changing dl_tls_static_used concurrently, couldn't a signal handler > that interrupts _dl_try_allocate_static_tls change it? See _dl_mask_all_signals (&old); > If so, the same > âraceâ applies, otherwise, I don't see why we need even a single CAS, > let alone a CAS loop. > > > I'm somewhat concerned about priority inversion in the atomic_delay() > and CAS loops. Considering that the locks the patch drops are all > recursive, they won't pose an AS-Safety problem; if we keep them in > place, but take the measures that have apparently been taken to avoid > the problems of using inconsistent state when a handler interrupts a > lock-holding operation, and of using outdated state after a signal > handler modifies it, it should all work, and then it would be clear that > the CAS loops are only in place for the case of functions being > interrupted by signal handlers, not to deal with inter-thread > concurrency. With the locks in place, we could at least get rid of the > atomic_delay() loop. > > > This is all I got for now. Thank you both for your work on this issue. > > -- > Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/ > You must be the change you wish to see in the world. -- Gandhi > Be Free! -- http://FSFLA.org/ FSF Latin America board member > Free Software Evangelist Red Hat Brazil Compiler Engineer -- Support staff hung over, send aspirin and come back LATER.
Attachment:
signal_safe.c
Description: Text document
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |