This is the mail archive of the gdb@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: MPI run terminated on exception even with GDB


On 9/18/2016 4:17 AM, Mahmood Naderan wrote:
> Hello,
> I am trying hard to find a bug that causes ISA incompatibility across two different AMD opterons.
> 
> This is an MPI job and I issue the command on the frontend (named cluster) and the actual multithread job is running on a compute node (named compute-0-1). 
> 
> 
> Problem is that, even if I run the MPI command under GDB, after the exception the process is terminated. So, I am not able to "disas" the program.
> 
> Please note that I compiled MPI and other applications with -g -Os (or -g -O2)
> 
> mahmood@cluster:tran-bt-o-40$ cat sc.sh
> #!/bin/bash
> ulimit -c unlimited
> exec /share/apps/siesta/siesta-4.0/tpar/transiesta < trans-cc-bt-cc-163-20.fdf
> 
> 
> mahmood@cluster:tran-bt-o-40$ cat hosts.txt
> compute-0-1
> 
> 
> mahmood@cluster:tran-bt-o-40$ gdb --args /share/apps/siesta/openmpi-2.0.0/bin/mpirun -hostfile hosts.txt -np 15 sc.sh
> GNU gdb (GDB) Red Hat Enterprise Linux (7.2-90.el6)
> ...
> Reading symbols from /share/apps/siesta/openmpi-2.0.0/bin/mpirun...done.
> (gdb) run
> 
> ...
> --------------------------------------------------------------------------
> mpirun noticed that process rank 0 with PID 11772 on node compute-0-1 exited on signal 4 (Illegal instruction).
> --------------------------------------------------------------------------
> [Thread 0x2aaaab5cf700 (LWP 25335) exited]
> [Thread 0x2aaaab3ce700 (LWP 25333) exited]
> [Thread 0x2aaaab1cd700 (LWP 25332) exited]
> [Thread 0x2aaaaafcc700 (LWP 25330) exited]
> 
> Program exited with code 0204.
> Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.80.el6_3.6.x86_64 libibverbs-1.1.8-4.el6.x86_64 libnl-1.1-14.el6.x86_64 libudev-147-2.42.el6.x86_64
> (gdb) disas
> No frame selected.
> (gdb)
> 
> 
> 
> 
> 
> 
> If you have any idea about that, please let me know.

Mahmood,
>From the log above it looks like the exception occurred in the remote
process, and mpirun shut down and exited.  You need to debug the remote
process, not mpirun.  You need to either (a) use an MPI-aware parallel
debugger, (b) get the failing remote process to create a core dump when
the exception occurs and debug that, or (c) somehow attach GDB to the
remote process.  Since you only have one remote process, maybe you could
put a 'while (i==0);' loop at the beginning of the program, then 'set
variable i = 1' after you attach and debug from there.

If that isn't useful, you might have better luck getting specific
suggestions on how to debug your MPI program on an MPI mailing list,
probably one related to whatever flavor of MPI you are using.
Regards,
--Don


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]