This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: exercising current aarch64 kprobe support with systemtap


On 08/04/2016 04:50 PM, William Cohen wrote:
> On 08/04/2016 10:35 AM, Pratyush Anand wrote:
>> Hi Will,
>>
>> On 04/08/2016:09:56:45 AM, William Cohen wrote:
> ...
>>> Hi,
>>>
>>> The OOM errors came before the otf_stress_hard_iter_5000 test that previous triggered the infinite unexpected EL1, so can't really say that the proposed patch has fixed the problem.
>>
>> Yes, yes, previously also we were getting OOM, and then that OOM was triggering
>> infinite unexpected EL1, because OOM message uses WARN_ON() to print, and
>> WARN_ON() uses "BRK BUG_BRK_IMM". Now when it is printing though BRK, we were
>> hitting kprobe at print_worker_info() which was resulting in unexpected EL1.
>>
>> Proposed patch fixes kprobe tracing within none kprobe BRK context such as
>> uprobe or WARN_ON() breakpoint handler etc. So, now a kprobe at
>> print_worker_info() will work while printing message of WARN_ON().
>>
>>
>>>
>>> Any thoughts on how to track down the oom issue?  Are you able to replicate it running the systemtap onthefly/kprobes_onthefly.exp tests?
>>
>> Sure, will look into. Have reserved a seattle.
>>
>> ~Pratyush
>>
> 
> Hi Pratyush,
> 
> The stack backtrace of http://paste.stg.fedoraproject.org/5375/ is:
> 
> 
> [  668.676682] [<fffffc00082386fc>] page_counter_cancel+0x54/0x60
> [  668.682508] [<fffffc000823885c>] page_counter_uncharge+0x2c/0x40
> [  668.688509] [<fffffc0008239c68>] cancel_charge+0x40/0xe0
> [  668.693815] [<fffffc000823fdfc>] mem_cgroup_cancel_charge+0x2c/0x38
> [  668.700088] [<fffffc00081c96a8>] uprobe_write_opcode+0x4e8/0x688
> [  668.706089] [<fffffc00081c9878>] set_swbp+0x30/0x40
> [  668.710962] [<fffffc00081c98e4>] install_breakpoint.isra.10+0x5c/0x2b8
> [  668.717484] [<fffffc00081ca6d8>] uprobe_mmap+0x248/0x2a8
> [  668.722791] [<fffffc000820fbac>] mmap_region+0x204/0x558
> [  668.728097] [<fffffc0008210164>] do_mmap+0x264/0x320
> [  668.733057] [<fffffc00081f2238>] vm_mmap_pgoff+0xb0/0xd8
> [  668.738363] [<fffffc00081f22d0>] vm_mmap+0x70/0xa0
> [  668.743149] [<fffffc00082a62c8>] elf_map+0x80/0xf8
> [  668.747934] [<fffffc00082a7a48>] load_elf_binary+0x480/0xb90
> [  668.753588] [<fffffc0008252e7c>] search_binary_handler+0xbc/0x210
> [  668.759674] [<fffffc0008253810>] do_execveat_common+0x4b0/0x620
> [  668.765587] [<fffffc0008253c74>] SyS_execve+0x44/0x58
> [  668.770633] [<fffffc0008082c4c>] __sys_trace_return+0x0/0x4
> 
> There is some uprobe code running in the traceback.  It looks like things are going wrong when uprobes are being installed on a newly loaded executable.
> 
> -Will
> 

Hi,

I was able to locally build uptream_arm64-devel branch of  https://github.com/pratyushanand/linux.git with the configure from fedora rawhide and run the systemtap tests. Pratyush were there changes in patches between these versions?  The only other difference is that the machine above was a fedora 24 machine rather than a RHELSA, so there would be differences in the compiler and other tools. The results (systemtap.log and systemtap.sum) are at:

http://people.redhat.com/wcohen/aarch64/20160817/

I the results have been sent to dejazilla, but dejazilla appears to be having issues with displaying results (https://web.elastic.org/~dejazilla/viewsummary.php?summary=%3D%27%3Ccdc0ada3-26b8-295d-2d3b-d8a88da83e17%40redhat.com%3E%27)

The results look pretty respectable

		=== systemtap Summary ===

# of expected passes		8809
# of unexpected failures	69
# of unexpected successes	1
# of expected failures		339
# of unknown successes		3
# of known failures		95
# of untested testcases		749
# of unsupported tests		33

There are about a dozen failures due to the the search for atomic regions going beyond the beginning of the function which prevents probes on a number of functions like the following test:

spawn stap /root/systemtap_write/systemtap/testsuite/systemtap.base/bz1027459.stp
WARNING: probe kernel.function("SyS_set_tid_address@kernel/fork.c:1234").call (address 0xfffffc00080cb528) registration error (rc -22)
WARNING: probe kernel.function("SyS_sched_setaffinity@kernel/sched/core.c:4716").call (address 0xfffffc00081051b0) registration error (rc -22)
WARNING: probe kernel.function("SyS_sched_get_priority_min@kernel/sched/core.c:5040").call (address 0xfffffc0008105688) registration error (rc -22)
WARNING: probe kernel.function("SyS_sched_get_priority_max@kernel/sched/core.c:5013").call (address 0xfffffc0008105620) registration error (rc -22)

There are also some differences in the syscalls used on aarch64 that cause some of the tests to fail.

-Will


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]