This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
Re: Poor scaling
On 08/09/2016 01:38 AM, Avi Kivity wrote:
>
>
> On 08/08/2016 06:00 PM, David Smith wrote:
>> On 08/07/2016 07:47 AM, Avi Kivity wrote:
>>> Running a process with 12 active threads, each bound to its own core,
>>> and touching a systemtap array fairly frequently, I see:
>>>
>>>
>>> 72.91% scylla [kernel.kallsyms] [k] _raw_spin_lock
>>> 3.49% scylla [kernel.kallsyms] [k]
>>> start_callback.isra.62
>>> 1.77% scylla scylla [.]
>>> _ZSt13__adjust_heapIN9__gnu_cxx17__normal_iteratorIPPN8logalloc11region_implESt6vectorIS4_SaIS4_EEEElS4_NS0_5__ops15_Iter_comp_iterIZNS2_7tracker4impl15compact_on_idleERKSt8functionIFbvEEEUlS4_S4_E_EEEvT_T0_SM_T1_T2_.isra.437.constprop.582
>>>
>>>
>>> 1.54% scylla [kernel.kallsyms] [k] _raw_spin_unlock
>>> 1.39% scylla scylla [.]
>>> _ZN7reactor10smp_pollfn4pollEv
>>> 1.29% scylla libc-2.19.so [.] epoll_pwait
>>> 1.21% swapper [kernel.kallsyms] [k] poll_idle
>>> 1.06% scylla [kernel.kallsyms] [k]
>>> _raw_read_lock_irqsave
>>> 1.01% scylla scylla [.]
>>> _ZSt11__push_heapIN9__gnu_cxx17__normal_iteratorIPPN8logalloc11region_implESt6vectorIS4_SaIS4_EEEElS4_NS0_5__ops14_Iter_comp_valIZNS2_7tracker4impl15compact_on_idleERKSt8functionIFbvEEEUlS4_S4_E_EEEvT_T0_SM_T1_T2_.isra.436.constprop.583
>>>
>>>
>>> 0.80% scylla [kernel.kallsyms] [k]
>>> _raw_read_unlock_irqrestore
>>> 0.78% scylla scylla [.]
>>> _ZN17smp_message_queue20flush_response_batchEv
>>> 0.75% scylla [kernel.kallsyms] [k]
>>> utrace_report_syscall_entry
>>> 0.63% scylla [kernel.kallsyms] [k] task_utrace_struct
>>> 0.50% scylla [kernel.kallsyms] [k] native_read_tsc
>>> 0.37% scylla [kernel.kallsyms] [k]
>>> utrace_report_syscall_exit
>>> 0.32% scylla [kernel.kallsyms] [k] delay_tsc
>>> 0.28% scylla scylla [.]
>>> _ZN17smp_message_queue13process_queueILm2EZNS_16process_incomingEvEUlPNS_9work_itemEE_EEmRNS_8lf_queueET0_.isra.2371
>>>
>>>
>>> 0.26% scylla scylla [.]
>>> _ZN17smp_message_queue13process_queueILm4EZNS_19process_completionsEvEUlPNS_9work_itemEE_EEmRNS_8lf_queueET0_.isra.2278
>>>
>>>
>>> 0.20% scylla [kernel.kallsyms] [k] _raw_spin_lock_irq
>>> 0.18% scylla scylla [.]
>>> _ZN6memory9cpu_pages4freeEPv
>>> 0.18% scylla liblz4.so.1.0.0 [.] LZ4_compress
>>> 0.17% scylla scylla [.]
>>> _ZN7reactor9run_tasksER15circular_bufferISt10unique_ptrI4taskSt14default_deleteIS2_EESaIS5_EE
>>>
>>>
>>>
>>>
>>> Now this is not something I can run to monitor a production load.
>>>
>>>
>>> Is there any way I can improve scaling?
>> It depends, we'd have to see your script to make suggestions.
>>
>
>
> Here is the latest version of my script (I'm not sure whether tid() is a
> process-local or global identifier, so I used a pid(), tid() pair).
You shouldn't need to use the pair, both are unique systemwide. pid()
returns the process group id (task->tgid) and tid() returns the thread
id (task->pid). (The naming is an artifact of how process group ids
where shoehorned into the linux kernel.)
> #!/usr/bin/stap
>
> # usage: task_latency.stap process_name latency_threshold_ms
>
> global start_time
>
> probe process(@1).mark("reactor_run_tasks_single_start") {
> start_time[pid(), tid()] = gettimeofday_us()
> }
>
> probe process(@1).mark("reactor_run_tasks_single_end") {
> delete start_time[pid(), tid()]
> }
>
> probe timer.profile {
> if ([pid(), tid()] in start_time) {
> now = gettimeofday_us()
> start = start_time[pid(), tid()]
> if ((now - start) > $2 * 1000) {
> printf("detected tasks running for >%sms\n", @2)
> print_usyms(ubacktrace())
> }
> }
> }
>
>
> Are entire probes serialized, or just individual variables? If the
> latter, perhaps you can detect that an array's index is always tid(),
> remove that index, and convert the global variable to a thread-local one
> (with an exception for begin and end probes).
The same probe handler can be running simultaneously in different
threads. Probe handlers (and probe local variables) aren't really
serialized. However, global variable access is serialized.
--
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)