This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug uprobes/13539] occasional oops, kernel SEGV, RHEL5, :uprobes:uprobe_free_process+0xba/0x131


http://sourceware.org/bugzilla/show_bug.cgi?id=13539

Frank Ch. Eigler <fche at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P1                          |P2
           Severity|critical                    |normal

--- Comment #4 from Frank Ch. Eigler <fche at redhat dot com> 2012-01-02 22:30:45 UTC ---
Some code from Jim Keniston, later adapted by yours truly, makes some
progress on the race conditions present in runtime/uprobes{,2}.  It
seems like something more dramatic will be required.

I pushed my current working changes to the new branch pr13539.  Test
it with # make installcheck RUNTESTFLAGS=unprivileged_myproc.exp on
an SMP (virtual?) machine.  I can reproduce some problem or another
on a rhel5 2.6.18-300{,debug}.{i686,x86_64} box easily, and less
easily on rhel6/fedoras.  (On the latter, run it in a loop.)

The original race condition was that the "./loop 1" program's threads
killed themselves right around the same time as the stap module 
decides to unregister probes due to the probe handler's { exit() }.
The effect is that the suicide uprobe_report_* callbacks race with
uprobe_free_{task,process} coming in from uprobe_put_process.  One
of them ends up deallocating the uprobe_proc struct, the other ends
up trying to take a semaphore, or muck with a hlist node, in the
resulting freed block.

The current status of the pr13539 branch works around some of the
various possible races, but now gets stuck in the post-exit (?)
utrace-quiesce loop of the target "./loop 1" process:

[1480353.094558] stap_3960d10ec2d1cdbbd5924a89713e08c4_2157: systemtap:
1.7/0.152, base: ffffffff88740000, memory: 94data/25text/2ctx/2058net/34alloc
kb, probes: 2, unpriv-uid: 0
[1480353.107405] uprobe_report_clone ffff81003c627138 14025=14025
[1480353.118383] uprobe_report_clone2 ffff81003c627138 14025=14025
[1480353.122169] uprobe_report_exit ffff81003c627138 14025=14028
[1480353.125266] uprobe_report_quiesce ffff81003c627138 14025=14025
[1480353.128373] uprobe_report_quiesce2 ffff81003c627138 14025=14025
[1480353.130829] uprobe_report_quiesce3 ffff81003c627138 14025=14025
[1480353.133212] uprobe_report_exit1a ffff81003c627138 14025=14028
[1480353.135620] uprobe_report_exit2 ffff81003c627138 14025=14028
[1480353.138275] uprobe_free_task ffff81000f69aa48 (tid 14028), caller
ffffffff88718bfcS, ctid 14028
[1480353.142031] uprobe_report_exit3 ffff81003c627138 14025=14028
[1480353.144330] uprobe_report_exit4 ffff81003c627138 14025=14028
[1480353.157461] uprobe_free_process ffff81003c627138 (pid 14025), caller
ffffffff88717048S, ctid 14028
[1480353.161439] uprobe_free_task ffff81000f69a5e8 (tid 14025), caller
ffffffff88716fb2S, ctid 14028
[1480353.165132] uprobe_free_process zap ffff81003c627138

[sysrq-t sez: ...]

[1486034.240725] stap          X ffff8100131a9588     0 13933  12033           
         (L-TLB)
[1486034.245729]  ffff810010a0df08 0000000000000046 ffff810019064d60
0000000000000246
[1486034.263375]  ffff810013733e70 0000000000000009 ffff810019064700
ffffffff8032ed40
[1486034.269112]  0005425fab97e401 00000000007b7869 ffff8100190648e8
0000000013733e60
[1486034.273530] Call Trace:
[1486034.276317]  [<ffffffff800ce099>] check_dead_utrace+0x11c/0x185
[1486034.278710]  [<ffffffff80016bc3>] do_exit+0x96c/0x978
[1486034.280831]  [<ffffffff8004b58f>] debug_mutex_init+0x0/0x3b
[1486034.283114]  [<ffffffff800602a6>] tracesys+0xd5/0xdf
[1486034.285251] 

[1486034.286547] loop          t ffff8100218fb148     0 14025      1           
    3882 (NOTLB)
[1486034.301736]  ffff810010a17cf8 0000000000000046 0000000000000246
ffffffff802a3700
[1486034.306797]  ffffffff8871d0a0 0000000000000007 ffff81000fbe25c0
ffff810014d54340
[1486034.311027]  0005425f92a59fff 0000000000796f2c ffff81000fbe27a8
0000000200000001
[1486034.314436] Call Trace:
[1486034.317175]  [<ffffffff800ce543>] utrace_quiescent+0xe6/0x26d
[1486034.320411]  [<ffffffff800cf503>] utrace_get_signal+0x4f8/0x55b
[1486034.323536]  [<ffffffff8002c804>] get_signal_to_deliver+0x5a/0x4b9
[1486034.326940]  [<ffffffff8002c930>] get_signal_to_deliver+0x186/0x4b9
[1486034.339013]  [<ffffffff8005d427>] do_notify_resume+0x9c/0x7b0
[1486034.341901]  [<ffffffff800936cf>] default_wake_function+0x0/0xe
[1486034.350764]  [<ffffffff80032cd6>] do_fork+0x148/0x1c1
[1486034.353004]  [<ffffffff80067fb8>] trace_hardirqs_off_thunk+0x35/0x67
[1486034.355512]  [<ffffffff8006035f>] int_signal+0x12/0x17

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]