This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
Re: Using systemtap on MPI applications
- From: "Olausson, Bjoern" <contactme at olausson dot de>
- To: "Frank Ch. Eigler" <fche at redhat dot com>
- Cc: systemtap at sourceware dot org
- Date: Wed, 16 Mar 2016 09:13:25 +0100
- Subject: Re: Using systemtap on MPI applications
- Authentication-results: sourceware.org; auth=none
- References: <CAE7O3Td1P1jFFbbBFu5k+uu8w8Zw9+fSgRYJqc7d4H57dAV09A at mail dot gmail dot com> <y0mh9g7a77l dot fsf at fche dot csb>
On Tue, Mar 15, 2016 at 8:09 PM, Frank Ch. Eigler <fche@redhat.com> wrote:
>
> Hi -
>
> "Olausson, Bjoern" wrote:
>
>> [...]
>> I am curious if there is a smart way to trace (basically IO) for MPI
>> applications running on multiple nodes?
>>
>> I guess it would be possible to either run stap globally or run
>> "mpirun <options> stap script.stp -c mpi-application"
>> [...]
>
> That would be the brute-force method. It would require installing the
> compiler etc. on all the hosts, unless you run a central stap-server
> instance to do the compilation part of the work (passes 1-4).
>
> Another possibility now is to use "stap --remote HOST1 --remote HOST2
> ..." from a central box, which internally does "ssh HOST1 stapsh" to
> maintain a two-way link, and perform remote execution (pass 5).
>
>
> It would be nice if stap --remote learned about mpi (openmpi?), so as
> to use mpirun or similar to manage remote startup of stapsh and
> multiplex stdin/stdout/stderr communications with all the hosts in your
> hostfile:
> % stap --remote mpi:/path/to/hostfile
>
> Or even
> % stap --remote mpirun:HOST1 --remote mpirun:HOST2
> may be worth doing, using individual "mpirun -H" jobs per host.
>
>
> - FChE
The --remote switch is a great start, didn't come across that one yet.
Thanks a lot!
Indeed it would be great if stap would be MPI aware in some way.
Still there is the issue on how to filter what stap is tracing. How
would I tell stap to only focus on one particular executable or PID
when using the --remote switch so target() can be used.
Any ideas on that?
Greetings,
Bjoern