This is the mail archive of the libc-help@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: dlmopen with RTLD_GLOBAL


On Fri, Jun 30, 2017 at 1:38 PM, Carlos O'Donell <carlos@redhat.com> wrote:
> I'm interested. Can you explain your use case?

Basically I want to run multiple copies of Python within the same process.

As background, I'm working on Python support for Legion. Legion is a
parallel programming system described here:
http://legion.stanford.edu/ . The goal is to achieve parallel
execution of user-written Python scripts (e.g. for scientific
simulations and data analysis).

Python supports threads, but getting actual speedups out of parallel
Python programs can be a challenge. Python uses a global interpreter
lock (GIL) that makes the interpreter thread-safe in a very coarse
grained way. The GIL effectively prevents Python from executing
multiple Python instructions in parallel, and can prevent Python codes
from achieving parallel speedups. In practice, the GIL might or might
not be an issue depending on the workload. For example, Python code
that calls C/C++ will release the GIL for the duration of the foreign
call. But because I'm dealing with user-supplied scripts, I can't
guarantee that all code will be written in a way that avoids use of
the GIL, and I'd still like that code to achieve reasonable speedups.

Historically the approach used to parallelize compute-intensive Python
code is to spawn multiple processes. But the built-in module for this
wasn't going to work with Legion. One of the largest differences
between Legion and the multiprocessing module is that we tend to work
with large datasets and can't afford to serialize/deserialize and copy
every piece of data for use with multiple processes. (In some cases
the data wouldn't even fit in memory if you made multiple copies of
it.) Rolling our own multi-process Python wrapper to handle this use
case seemed like a lot of unnecessary complexity that we should avoid
if at all possible. Much of this complexity would go away if it were
possible to use multiple Python interpreters in the same process.

There was some initial work in Python to do this via
"sub-interpreters" but as best I can tell that never panned out, and
there are many nasty corner cases that are difficult to fix,
particularly around threading.

Another option is to use dlmopen to create multiple entirely distinct
copies of the Python interpreter. This approach was explored initially
about a year ago and a working proof of concept is available:

https://news.ycombinator.com/item?id=11844268
https://gist.github.com/dutc/eba9b2f7980f400f6287

However, there is a major pitfall not exposed in the proof of concept:
Native Python modules (i.e. Python modules written C/C++) don't work.
This means e.g. NumPy doesn't work, which for us in game over.

Native modules are shared objects. Because Python might or might not
be dynamically linked, native modules do not explicitly depend on
libpython*.so. Instead they expect the Python symbols to be available
in the global scope of the process in which they are loaded. Under
dlmopen, the only option is to use RTLD_LOCAL, so these symbols aren't
exposed, and any native modules fail to find the needed symbols.

Roughly, the sequence of events is:

main:
  dlmopen(LM_ID_NEWLM, "libpython2.7.so", RTLD_DEEPBIND | RTLD_LOCAL |
RTLD_LAZY)
  from inside user Python script:
    import some_native_module
    this causes Python to execute the following (remember this is
inside the new namespace):
      dlopen("some_native_module.so", ...)

If RTLD_GLOBAL is an option with dlmopen, then the symbols can be
exposed within the new namespace, and subsequent dlopen calls to
shared objects that do not explicitly mention Python will succeed.

I've tested this with the patch that is available, and this seems to
work. There are some additional gotchas but none resulting from bugs
in libc, so I won't go over those here.



-- 
Elliott Slaughter

"Don't worry about what anybody else is going to do. The best way to
predict the future is to invent it." - Alan Kay


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]