This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: systemtap ARM port status


On Fri, Jun 01, 2007 at 01:49:51PM -0700, Roland McGrath wrote:
However, one of the changes I made to loc2c-runtime.h shouldn't be
necessary, if anything, it should cause things to fail, but instead
it makes some tests now pass.  I sent some mail to Martin about it
to see if he has any ideas as to what's going on.

Please use this mailing list for such discussion.

Hi Roland! I don't think we've swapped mail in some time, maybe even going back to the gmake alpha devel mailing list back in the '90s.

Well, gee, I think that's a real first for me.  I'm often told to
take a discussion off a mailing list to e-mail, not the other way
around!

Okay, you ask for it; you got it.  The two mails I sent to Martin
about these issues are below.

Thanks,
Roland

Quentin


+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Date: Fri, 1 Jun 2007 00:17:24 -0500
From: Quentin Barnes <qbarnes@urbana.css.mot.com>
To: Martin Hunt <hunt@redhat.com>
Subject: Strange failures with syscall testsuite

I've got most all of the testsuite tests passing (or at least
understand why they fail) with one big, notable exception, all
the tests under systemtap.syscall fail.  I've been working on
understanding the failures, but am starting to hit a wall due to
several anomalies I've encountered.

Here's a sample failure:
=====
Testing 32-bit access
FAIL: 32-bit access
access FAILED. output of "stap -c ../access /usr/src/systemtap-20070519/testsuite/systemtap.syscall/sys.stp" was:
------------------------------------------
staprun: getpid () = N/A
staprun: getrlimit (RLIMIT_STACK, 0xbec5caf0) = N/A
staprun: rt_sigaction (32, 0xbec5ca70, 0x00000000, 8) = N/A
staprun: rt_sigaction (33, 0xbec5ca70, 0x00000000, 8) = N/A
staprun: rt_sigaction (34, 0xbec5ca70, 0x00000000, 8) = N/A
WARNING: Number of errors: 1, skipped probes: 0
ERROR: kernel string copy fault at 0xc1068000 near identifier '$filename' at /usr/local/share/systemtap/tapset/syscalls.stp:49:48
------------------------------------------
RESULTS: ('*' = MATCHED EXPECTED)
--------- EXPECTED and NOT MATCHED ----------
access: access \("foobar1", F_OK\) = 0
access: access \("foobar1", R_OK\) = 0
access: access \("foobar1", W_OK\) = 0
access: access \("foobar1", X_OK\) = -[\-0-9]+ \(EACCES\)
access: access \("foobar1", W_OK \|R_OK\) = 0
access: access \("foobar1", X_OK \|W_OK \|R_OK\) = -[\-0-9]+ \(EACCES\)
=====

0xc1068000 is a legit kernel memory address on ARM.

When I look at line 49 in syscalls.stp, it is:
       argstr = sprintf("%s, %s", user_string_quoted($filename), mode_str)

$filename is the first argument to sys_access() which has type
"const char __user *filename".  I dumped the regs as passed into
enter_kprobe_probe().  R0 matches the address (i.e. 0xc1068000),
so that seems fine, sort of.

The first anomaly is 0xc1068000 is a _kernel_ memory address.
Shouldn't the address passed into sys_access() be a user-space
address due to the "__user" qualifier?  I don't know Linux kernel
programming nuances, but I thought that tag meant it was expected to
be a user-space address only.

I assume the error as reported is from expanding the
"user_string_quoted($filename)" expression.  Having a user space
string routine expand a kernel address would certainly be fatal!
ARM uses an "ldrbt" instruction to do the copy -- that would fault
on trying to copy from a kernel memory address.

But then there's anomaly number two.  The error message is
"ERROR: kernel string copy fault at ...". This is a message from
function_kernel_string().  This function is expecting to copy a
string at a kernel address.  How did it get invoked from a service
called "user_string_quoted"?

In looking at the implementation of function_kernel_string(),
it calls deref_string(), a macro in loc2c-runtime.h.  deref_string()
invokes deref().  For byte operations, deref() invokes a platform
specific call.  In the case of i386, it is called "__get_user_asm()".
Now I would expect such a routine to perform user-space only loading
of data and accessing kernel data would fault.  Anomaly number three
is why doesn't calling such a function from function_kernel_string()
fault on kernel memory access on other platforms like it does on ARM?

As a hack, I decided to modify ARM's call of __get_user_asm_byte()
by deref() to just do "*(char *)addr".  Suddenly, all the test cases
that were previously failing with the "ERROR: kernel string copy
fault at ..." diagnostic are now making it past that point and much
further!  (Most of the tests might actually even be passing now,
but debug messages I put into the stap output are throwing off the
Expect scripts.)

Could ARM be doing something unusual (or the systemtap test suite)
with providing a kernel data address to sys_access() and the other system calls?


What am I not understanding?  Any words of wisdom on what's going on
and how to solve this?

Quentin


+++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Date: Fri, 1 Jun 2007 15:04:31 -0500 From: Quentin Barnes<qlb01+stap@qbarnes.org> To: Martin Hunt<hunt@redhat.com> Subject: Re: Basic ARM support for Systemtap

At the end of this mail are three additional files for ARM on top of
the ones I sent you.  They are pretty straight forward.  The patch
fixes the "N/A" in the output I had sent you (and one cast).

In the mean time, I hacked the deref macro in loc2c-runtime.h to
be this for ARM:
=====
#define deref(size, addr) \
({ \
int _bad = 0; \
intptr_t _v=0; \
switch (size){ \
case 1: _v = *(char *)addr; break; \
case 2: _v = *(short *)addr; break; \
case 4: _v = *(int *)addr; break; \
default: __get_user_bad(); break; \
} \
if (_bad) \
goto deref_fault; \
_v; \
}) =====


This hackery above and the patch below, for the first time ever, I'm
getting the "systemtap.syscall" testsuite to pass!  I don't know how
many will pass.  It will take another several hours for the suite to
finish running that section on my devel board, but I am very curious
to see.

If you can help me understand what's going on here with why the
above hackery "works" at all, it would be a big help.  (My questions
are in the previous mail I sent in the wee hours earlier today).

Quentin




Index: runtime/stack-arm.c =================================================================== --- runtime/stack-arm.c (revision 195) +++ runtime/stack-arm.c (working copy) @@ -59,7 +59,7 @@ static void __stp_stack_print (struct pt _stp_symbol_print((unsigned long)pc); _stp_print_char('\n'); } else { - _stp_printf("%08lx ", pc); + _stp_printf("%08lx ", (unsigned long)pc); }

		/* Sanity check the next_fp. */
Index: runtime/regs.c
===================================================================
--- runtime/regs.c	(revision 195)
+++ runtime/regs.c	(working copy)
@@ -46,6 +46,8 @@ unsigned long _stp_ret_addr (struct pt_r
	return regs->b0;
#elif defined (__s390__) || defined (__s390x__)
	return regs->gprs[14];
+#elif defined (__arm__)
+	return regs->ARM_r0;
#else
	#error Unimplemented architecture
#endif
Index: tapset/errno.stp
===================================================================
--- tapset/errno.stp	(revision 194)
+++ tapset/errno.stp	(working copy)
@@ -370,6 +370,8 @@ function returnstr:string (returnp:long)
		ret = CONTEXT->regs->u_regs[UREG_RETPC];
#elif defined (__s390x__)
		ret = CONTEXT->regs->gprs[2];
+#elif defined (__arm__)
+		ret = CONTEXT->regs->ARM_r0;
#else
		goto no_ret;
#endif


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]