This is the mail archive of the
systemtap@sources.redhat.com
mailing list for the systemtap project.
Re: architecture paper draft
- From: "Stephen C. Tweedie" <sct at redhat dot com>
- To: "Frank Ch. Eigler" <fche at redhat dot com>
- Cc: Stephen Tweedie <sct at redhat dot com>, Richard J Moore <richardj_moore at uk dot ibm dot com>, systemtap at sources dot redhat dot com
- Date: Thu, 10 Feb 2005 11:56:15 +0000
- Subject: Re: architecture paper draft
- References: <20050127212504.GH22921@redhat.com> <OFF9E81F9D.E39D7F25-ON00256FA3.004DFAD1-00256FA3.004EB0D4@uk.ibm.com> <20050209233419.GI5011@redhat.com>
Hi,
On Wed, 2005-02-09 at 23:34, Frank Ch. Eigler wrote:
> (For what it's worth, I'm still hoping for greater expertise to
> make itself known on the subject of use of copy_from_user and its
> kin from arbitrary kprobes contexts, to provide safety guarantees.)
The "safe" answer is just Don't Do That.
There are basically three classes of problem with arbitrary
copy_*_user.
First, you cannot access user space from within an interrupt or while
holding a spinlock. This even applies to mlock()ed memory: for example,
on a 4G/4G system user accesses always require the page_table_lock
spinlock, which is not interrupt-safe.
Secondly, LAZY_TLB means that the current active mm may not always be
the currently-running task's native one. Kernel daemons, for example,
don't cause an MM context switch when you schedule them (they don't even
*have* an MM of their own); they keep pointing to the user space of the
previously-running task, because there's no point in doing an expensive
MM reload when it's assumed that the daemon is never going to access
user space anyway.
And finally, the set_fs(KERNEL_DS) construct allows the kernel to
temporarily redirect all user-space access to kernel space. That's
often used to perform syscall-type file IO from kernel, not user, data.
For example, process accounting uses it to write to the accounting file:
when the write code eventually performs a copy_from_user() to read the
caller's data buffer, it ends up accessing the kernel accounting data
instead.
All three of these will get in the way in different ways if you try to
access user space from arbitrary kernel contexts.
--Stephen