This is the mail archive of the systemtap@sources.redhat.com mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: architecture paper draft


Hi,

On Wed, 2005-02-16 at 02:21, Richard J Moore wrote:

> If we could access paged-in memory in
> user space without recourse to locking or any kernel calls then we should
> allow it.

It's possible.  But from what context?  If you do so from a probe deep
enough in the kernel that there are locks held that prevent page
faulting, then either you're going to page fault and deadlock if you hit
paged-out memory, or you trap the recursive fault and return an error
back.  Either way you've not got transparent access.

And what if the memory access truly is to a completely unmapped area?
All of the normal kernel calls for accessing memory deal with this sort
of case properly and let you get back an immediate EFAULT if the address
is bad.  Those cases won't magically go away if you start accessing
user-space directly, and there simply isn't any way to know in advance
if the pointer is valid or not.

Unfortunately, we just can't pretend that we can simply access the
memory directly.  True, that might work all of the time if we're dealing
with normal programs --- typically, syscall arguments are recently used
and valid, so we won't fault on following the pointers.  But if the
overriding concern is safety, then dealing with all of the
page-not-present or pointer-invalid cases is necessary, and that *does*
require kernel help.

>  They problem that Stephen has surfaced is that for the 4g/4g
> kernel there will be no user page tables mapped. So it may be for this
> arrangement access to user-space access is just not possible.  I am not
> familiar enough with the 4g/4g mm to know how user pages are managed when
> we are running in the kernel.

Effectively you can think of it as the kernel and user space being
separate processes, with a tiny context switch when we switch between
them.  (There's a small amount of space mapped at the top of memory
which is common to both contexts, and which we use to execute the
trampoline code to bounce between the two contexts.)  So for 4g/4g,
accessing user space is really very much like "ptrace" accesses to a
different process --- you lock the page tables and VM data structures
for the process and then follow the page tables manually, and you invoke
the handle_mm_fault() code directly if the page is not appropriately
mapped.

>  Thinking of the top of my head, there may be
> ways we can quickly locate the page table and set up a temporary mapping to
> the same physical memory through an aliased kernel address, in a similar
> way we do for probe insertion into user-space. We'd need to reserve one or
> possibly two PTEs in kernel space for this purpose.

The physical memory does not necessarily exist.  It's all dynamic.  It
is paged in on demand.  And to do that paging requires appropriate
locking.  And you can't bypass that locking, because if you've got a
threaded application, or a debugger running, then another process may be
fiddling with your page tables from another CPU at the same time as the
probe is running. 

So if you're going to follow the page tables yourself, I still think you
need to use the kernel support for that.  If you don't, you just end up
reimplementing the exact same locking and page-table-following rules
that are already used.  

>   Note that paging in
> user memory from a probe handler is absolutely not the requirement here.
> Seldom is it the case the user memory of interest has been paged out. I'm
> hoping that in the 4g/4g case there a good chance of finding user memory
> merely unmapped.

In terms of being mapped/unmapped, it's just the same in 4g/4g as in the
normal 3/1 kernel.  The difference is *where* it's mapped --- in 3/1
it's in the same address space as the running kernel (usually, unless
you're in lazy TLB mode); in 4/4 it's a different address space.

--Stephen



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]