Bug 6429 - unwinding/symbol stuff broken on ppc*
Summary: unwinding/symbol stuff broken on ppc*
Status: RESOLVED FIXED
Alias: None
Product: systemtap
Classification: Unclassified
Component: runtime (show other bugs)
Version: unspecified
: P2 critical
Target Milestone: ---
Assignee: Frank Ch. Eigler
URL:
Keywords:
: 6405 6510 (view as bug list)
Depends on:
Blocks:
 
Reported: 2008-04-17 21:04 UTC by Frank Ch. Eigler
Modified: 2008-06-05 09:51 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
Patch to search "__start" symbol incase of PPC system. (347 bytes, patch)
2008-05-09 10:54 UTC, Srinivasa DS
Details | Diff
Patch to fix compilation errors, found on execution of systemtap scripts (580 bytes, patch)
2008-05-19 12:26 UTC, Srinivasa DS
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Frank Ch. Eigler 2008-04-17 21:04:30 UTC
One part of the breakage is the bundled-elfutils multilib bug mentioned
on the mailing list.

Another part is staprun/insmod errors (fedora 8, up-to-date kernel etc.):

[12919.839449] Systemtap Error at _stp_transport_init:274 failed to initialize
modules
Comment 1 Srinivasa DS 2008-04-21 08:50:44 UTC
Iam not able to test systemtap scripts on ppc systems due to the above issue.
Hence boosting the severity of the bug.

Thanks
 Srinivasa DS
Comment 2 Frank Ch. Eigler 2008-04-21 20:12:08 UTC
I'll take a crack at reorganizing this code.
Comment 3 Frank Ch. Eigler 2008-04-21 20:15:03 UTC
*** Bug 6405 has been marked as a duplicate of this bug. ***
Comment 4 Srinivasa DS 2008-04-24 11:45:49 UTC
Frank
 This is with respect to this error "Systemtap Error at _stp_transport_init:274
failed to initialize modules"
This error is generated by below patch
"http://sources.redhat.com/git/gitweb.cgi?p=systemtap.git;a=commitdiff;h=fa670082537aea7f090bc8dcfab69ac5f62546bc;hp=073b6ba57a498c3c97426f6f6d0666f1f5eb30d4"

These are my observations, hope it will help you to reorganize the code.

1) In my system __start and _stext both symbols have same address(may not be
same on all architecture). But emit_symbol_data() or
emit_symbol_data_from_debuginfo() searches for symbols whose address are not
same and puts it in stap-symbols.h.

cat /proc/kallsyms
c000000000000000 T .__start
c000000000000000 T _stext

Hence stap-symbols.h doesn't contain _stext symbol and systemtap fails.

vim /tmp/staptouTBk/stap-symbols.h

struct _stp_symbol _stp_kernel_symbols [] = {  { 0xc000000000000000, ".__start"
},
  { 0xc000000000000060, ".__secondary_hold" },
  { 0xc0000000000044f8, ".slb_miss_realmode" },

Thanks
 Srinivasa DS
Comment 5 Srinivasa DS 2008-05-09 10:54:39 UTC
Created attachment 2728 [details]
Patch to search "__start" symbol incase of PPC system.

Systemtap searches for "_stext" symbol in symbol table during initialization of
systentap module. Since address of _stext and __start symbols are same, only
__start symbol is added to symbol table in ppc and that causes systemtap to
fail on ppc.
This patch searches for __start symbol instead of _stext in ppc systems and
hence solves the problem.

Thanks
Srinivasa DS
Comment 6 Frank Ch. Eigler 2008-05-13 10:29:33 UTC
*** Bug 6510 has been marked as a duplicate of this bug. ***
Comment 7 Ananth Mavinakayanahalli 2008-05-15 12:40:43 UTC
Frank,
Is it possible to revert aaf2af3e3b0c159a64609c82811662d7253c3a96 till the
unwind related problems are fixed, since that is the cause of quite a few
failures recently?

Ananth
Comment 8 Frank Ch. Eigler 2008-05-15 13:38:05 UTC
Please give me a few more days to try to fix this stuff.
Because of the way the unwind branch was built, it would
be difficult to unroll the code partway.

If it helps you get stuff done in the interim, please
commit/push the patch from comment #5.
Comment 9 Srinivasa DS 2008-05-15 15:10:58 UTC
(In reply to comment #8)
> 
> If it helps you get stuff done in the interim, please
> commit/push the patch from comment #5.
> 

Frank
 There are 2 issues here and we have patch for the first issue.

1) This is related to " _stp_transport_init:274 failed to initialize
modules" problem and patch attached in comment#5 solves the problem temporarily


2)This is related to compilation error messages displayed when a simple
systemtap script is executed, "
/usr/local/share/systemtap/runtime/transport/symbols.c:407: error: dereferencing
pointer to incomplete type
/usr/local/share/systemtap/runtime/transport/symbols.c:425: error: dereferencing
pointer to incomplete type
/usr/local/share/systemtap/runtime/transport/symbols.c:427: error: dereferencing
pointer to incomplete type
/usr/local/share/systemtap/runtime/transport/symbols.c:456: error: dereferencing
pointer to incomplete type
/usr/local/share/systemtap/runtime/transport/symbols.c:457: error: dereferencing
pointer to incomplete type
/usr/local/share/systemtap/runtime/transport/symbols.c:460: error: dereferencing
pointer to incomplete type"

We don't have fix for this problem.

So fixing first issue, doesn't resolve the problem completely. 

Thanks
 Srinivasa DS
Comment 10 Srinivasa DS 2008-05-16 11:07:02 UTC
/usr/local/share/systemtap/runtime/transport/symbols.c:407: error: dereferencing
pointer to incomplete type
/usr/local/share/systemtap/runtime/transport/symbols.c:425: error: dereferencing
pointer to incomplete type
/usr/local/share/systemtap/runtime/transport/symbols.c:427: error: dereferencing
pointer to incomplete type

we can solve these compilation problems by extending autoconf-module-nsection.c
file like below and protecting "attr" variable with STAPCONF_MODULE_NSECTIONS in
runtime/transport/symbols.c.

#include <linux/module.h>

struct module *x;

void foo (void)
{
  (void) x->sect_attrs->nsections;
  (void) x->sect_attrs->attrs;
}

Comment 11 Wenji Huang 2008-05-16 16:12:04 UTC
(In reply to comment #10)
> we can solve these compilation problems by extending autoconf-module-nsection.c
> file like below and protecting "attr" variable with STAPCONF_MODULE_NSECTIONS in
> runtime/transport/symbols.c.
> 
> #include <linux/module.h>
> 
> struct module *x;
> 
> void foo (void)
> {
>   (void) x->sect_attrs->nsections;
>   (void) x->sect_attrs->attrs;
> }

The method is to provide the check of structure/member. But I found there are
several references like ->attr and ->grp. We need to find the replacements or
figure out other ways which could be bypass the two structures.

Another workround is to include the defintions in symbols.c or sym.h.

#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,26)
struct module_sect_attr
{
       struct module_attribute mattr;
       char *name;
       unsigned long address;
};

struct module_sect_attrs
{
       struct attribute_group grp;
       unsigned int nsections;
       struct module_sect_attr attrs[0];
};
#endif

Of course, it is not very graceful. 
Comment 12 Srinivasa DS 2008-05-19 12:26:39 UTC
Created attachment 2742 [details]
Patch to fix compilation errors, found on execution of systemtap scripts

Frank
 This is just an interim fix, I applied this patch and executed systemtap tests
on ppc and x86_64 systems on latest kernel.
Comment 13 Ananth Mavinakayanahalli 2008-06-03 10:03:10 UTC
This is still broken on powerpc. Looks like libdw is 32bit while the runtime
needs to be 64bit. I even tried ifdefing out using STP_USE_DWARF_UNWINDER, but
it still doesn't help.

Can this either be fixed or the code be disabled till such time it works
seamlessly across architectures please?
Comment 14 Frank Ch. Eigler 2008-06-03 18:11:54 UTC
commit 8928443 should defang this particular issue - leaving the unwinder
in place, but not feeding it with data from staprun/stapio.
Comment 15 Srinivasa DS 2008-06-04 11:38:49 UTC
(In reply to comment #14)
> commit 8928443 should defang this particular issue - leaving the unwinder
> in place, but not feeding it with data from staprun/stapio.

we encountered one build issue while building systemtap. Investigating further...

Thanks
 Srinivasa DS 
Comment 16 Ananth Mavinakayanahalli 2008-06-05 09:51:50 UTC
Verified build and tested again. Works fine.

Thanks Frank!

Ananth